Cluster failures due to transient cluster communications problems can often lead to databases failing over between nodes or nodes being removed from cluster membership. For many administrators the occasional network hiccups that can drive up the likely hood of this occurring is a fact of life. It is definitely more prevalent when cluster nodes are geographically dispersed.
On occasion there are recommendations to change the cluster network thresholds that determine if a node is alive or failed. A blog post that I’ve referenced for an explanation and guidance on this is published here:
I’m not sure if I’ve missed it or if Elden recently updated it but I noticed that there are new values listed in the table for Windows 2016 based clusters. I also noticed the following paragraph issuing what I thought was updated guidance on cluster network thresholds.
“To be more tolerant of transient failures it is recommended on Win2008 / Win2008 R2 / Win2012 / Win2012 R2 to increase the SameSubnetThreshold and CrossSubnetThreshold values to the higher Win2016 values. Note: If the Hyper-V role is installed on a Windows Server 2012 R2 Failover Cluster, the SameSubnetThreshold default will automatically be increased to 10 and the CrossSubnetThreshold default will automatically be increased to 20. After installing the following hotfix the default heartbeat values will be increased on Windows Server 2012 R2 to the Windows Server 2016 values.” This is a hotfix available that will do this for you. https://support.microsoft.com/en-us/help/3153887/fine-tuning-failover-cluster-network-thresholds-in-windows-server-2012
If changing these values or considering changing them I encourage you to book mark Elden’s blog and reference it for guidance.