Skip to content

How does a SQL 2005 DB Mirror Ping?

A SQL Server 2005 High Safety DB Mirroring session synchronously transfers transactions from the primary DB to the mirroring DB and then with the help of a witness server supports automatic fail over from the primary to the mirror is the primary becomes unavailable. All the primary, mirror and witness nodes all keep an eye on each other to maintain a quorum and detect if the session should fail over from the primary db to the mirror. Some of the documentation uses the term “ping” to describe the way the a node tests for the presence of others, so I started wondering what sort of ping is it?

Does it use a standard ICMP echo request ping? Does it use the Mirroring endpoint TCP port, the SQL Server TCP port, or some other hereto unknown port?

One part of the documentation makes it sound like it uses the DB Mirroring endpoint.

“To prevent this, database mirroring implements its own time-out mechanism, based on each server instance in a mirroring session sending out a ping on each open connection at a fixed interval.”

But I thought the easiest way to get to the bottom of this was just to set up a mirror and play with the ports. The mirror session was setup using certificates to secure it as described in Mirroring with certificates with the following topology:

  • Server 1: Initial Mirror DB and Witness DB
  • Server 2: Initial Principle DB

Using the Windows 2003 fire wall on server 2 I intended to open and close ports and watch the effects. To test which ports were actually open I would use ping, telnet, and to check the mirror state I would use a combination of the Database Mirroring Monitor on Server 2 and the sys.database_mirroring_witnesses view in the MSDB database on the mirroring server.

I first tested that the mirror was running, that I could ICMP ping server 2 from server and that I could telnet to TCP and 1433 and 5022. Then I turned on the fire wall on with all the required ports open to made sure everything still worked. Aside from the ports required to manage the server the following ports were explicitly opened on the server.

  • TCP 1433 for the default SQL Server installation
  • TCP 5022 for the DB Mirroring endpoint
  • TCP 445 to allow for ICMP Echo requests

By default mirror nodes use a timeout period of 10 seconds, so all the changes made below were give 20 or 30 seconds to have an effect.

Test One: Blocking ICMP Echo

With ICMP echo requests blocked by the fire wall I could not longer use the ping utility to ping Server 2, but it was still possible to telnet to ports 5022 and 1433.

The mirror did not fail over.

Test Two: Blocking the Mirror Endpoint TCP 5022

After first re-enabling ICMP Echo requests I blocked TCP 5022 used by the DB Mirroring endpoint on server 2. I could now ping server 2 and telnet to TCP 1433, but could no longer telnet to TCP 5022.

The mirror did not fail over.

Test Three: Blocking the SQL Server port TCP 1433

After first re–enabling TCP 5022 I blocked TCP 1433 the port the SQL Server instance was listening on. I could now ping server 2 and telnet to TCP 5022, but could no longer telnet to TCP 1433.

The mirror still did not fail over.

At this point the Database Mirroring Monitor would time out trying to collect information for the server 2 and display an error. However SSIS on server 1 had an open connection to server 2 in the object browser that would allow me to refresh the object tree, but any attempt to open a new query window connecting to server 2 would fail.

To check what I was seeing I tried to connect to server 2 from a third client machine was once again unable to establish a connection. The Database Mirroring Monitor, the view in the witness server and the SSIS object browser all said server 2 was an online principle in a mirror session and server 1 was a restoring mirror. But I could not open a connection to the server 2 from server 1 or a client.

Running netstat on server 2 showed that there were 4 connections from server 1 to server 2 (really 5 but one was TCP 139 for file and print sharing). There were two connections to TCP 5022, I assume on each from the partner node and the witness nodes running on server one. There were also two connection to TCP 1433, to work out what these were I closed down all query windows and severed server browser connections in SSIS on server 1.

One of the 1433 connection was an open query window on Server 1, the other disappeared when I closed the SSIS object browser connection to server 2.

Running netstat on server 2 now showed the following connections to and from server 1 (10.0.0.15):

    Proto  Local Address          Foreign Address        State
         TCP    10.0.0.23:1847         10.0.0.15:5022         ESTABLISHED
         TCP    10.0.0.23:1852         10.0.0.15:5023         ESTABLISHED
         TCP    10.0.0.23:5022         10.0.0.15:1605         ESTABLISHED
         TCP    10.0.0.23:5022         10.0.0.15:1608         ESTABLISHED

Two connections to and two connections from the mirroring nodes on server 1.

Using Wireshark to analyse the packs on server 2, I could see 3 or 4 packets a second bouncing back and forth between the two servers with TCP 5022/5023 at either end of the connection.

Test Four: Blocking TCP 1433 and 5022

While keeping TCP 1433 blocked I blocked 5022, and broke the mirror. The SSIS browser on server 1 now said the DB was a disconnected mirror in recovery, while the Database Mirroring Monitor reported errors retrieving information for both nodes. The view on the witness server still reported server 2 as the primary server.

While netstat on server two reported no inbound connects to TCP 5022 or 1433, and two outbound connections to TCP 5022 and 5023 on server 1.

A summation of my conclusions

I know the mirror was working, I’d manually forced fail over to test it and had been using these servers to test the Microsoft JDBC library against a failover. I may have made some sort of mistake in my tests but changing the windows fire wall is not that tricky. The windows firewall may act slightly oddly in the way it allowed existing connections to remain up.

But I was able to create a situation where clients could not connect on the default port to a default instance of SQL server the DB mirroring did not fail over. To me that sounds wrong, if running a ping through the DB Mirroring endpoints is how it works (I did not test forcing these connections to close when the ports were closed and the docs seem to suggest it) then this does not test what a client can see.

It’s not hard to imagine some different network topologies where one or the other is preferred. If all the db servers are in a subnet walled off from the clients, the clients may be prevented from accessing the db servers by the firewall while the db servers can see each other. Or if the DB servers are walled off from each other as well as from the clients, the db’s may be prevented from seeing each other while the clients can see them all.

There should be an option in the mirroring setup to test the connection between nodes on the TCP/TSQL endpoint for the servers. Then if it makes sense in the environment it can be turned on.

By the way if you read Possible Failures During Database Mirroring it says that possible “Hard Errors” that can cause mirroring to fail include “Changes in the firewall”, and that “The only active error checking performed for database mirroring occurs for soft error cases.” So it does what it says on the tin, still wish it would monitor the port clients need to access the server.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*