Why are my Cluster Networks Dropping Periodically?

Lately I have been seeing customer issues with Windows 2008 Clustering where they are seeing both of their networks drop connections and this is causing a failover event. In the cases that I have seen, they are usually very busy SQL clusters.

Windows needs available resources in order to run core services such as networking and disk I/O. When the server starts to run out of resources, core services suffer. This can happen with other resource intensive applications, but I have mostly seen it with SQL lately. In order to diagnose this problem, you need to know the baseline performance numbers for your servers by running a Performance Monitor Log when the server is performing as expected. If you start seeing where all network connections drop on the server at the same time, then it would be best to setup a Performance Monitor Log with all counters with an interval of 30 seconds. When this event happens again, you can stop the Performance Monitor Log, review the log and compare it with your baseline log to try and find out what is using up the system resources.

Here is a link to one of our TechNet articles titled: Performance Monitoring with SQL Server 2008. https://technet.microsoft.com/en-us/edge/performance-monitoring-with-sql-server-2008

If after examining the performance data you feel that the server is not running low on resources, then it is time to look at other options such as NICs, cabling, switches and so on. If you are still having a hard time finding out why this is happening, then give Microsoft Support a call and we will be glad to help you out.

Happy Clustering!

James Burrage
Support Escalation Engineer
Microsoft Enterprise Platforms Support