DAG IP addresses and Failover Cluster Manager

“Hey Chad, how come I see two IP addresses in Failover Cluster Manager (FCM)? Then only one is “online” and the other is “Offline”? Is there an issue with my DAG?”

Well let’s get some context here. This large customer has a stretched dag that spans two geographic and AD sites. This DAG (per Microsoft best practices) has two internal private IP’s for the MAPI networks at each location. For some additional reading, follow the linked rabbit hole below!

Understanding Database Availability Groups

So this stretched DAG is up and running successfully, replication is firing away from their primary datacenter to their DR datacenter with no issues. Everyone is happy and copy and replay queue lengths are low. Along comes their server monitoring team running around with their arms in the air screaming the Exchange 2010 sky is falling!

“There is an issue with your Exchange cluster, what would you like us to do to it for you?”

The correct answer here used (and much credit to my customer) was “Nothing”. Although Exchange DAG’s utilize Failover Cluster features of Windows Server 2008, it’s not as integrated as it was in the 2007 CCR days. Although Exchange leverages part of FC within Server 2008, it’s primary management method should always be the EMC or EM Shell. My token line about this is “If you’re in FCM, you’ve got some serious issues. Exchange DAG clusters should always be managed from EMC or EMS unless you’re doing a DC switchover and/or being assisted by Microsoft Support services (premier)”

So what are we looking at here?

DAG_IP

So the server team sees a resource “Offline” and panics. This image you see above is normal and  expected. Now the cluster “Owner” in any DAG case is the PAM or Primary Active Manager. The two IP’s we see above are going to be selected between based on which node of the stretched cluster is currently the PAM. In this example one of the nodes on the 10.84.189.X network is the PAM. How can we verify this? Easy sauce..

Get-DatabaseAvailabilityGroup A0000-DAG0102-V –Status | FL Name, *Prim*

If this node listed from the output fails, a new PAM will automatically be elected. If this node is on the same side of the stretched DAG, the DAG IP used listed above doesn’t change. If the selection / promotion process chooses a server on the far side the online / offline IP listing above would flip flop. There can only be one online IP for the DAG at anytime.

“Should I move my PAM to one datacenter over the other?”

Good question. Do you run an active / passive (Primary / DR) kind of scenario? do you have poor network connectivity to the other side of the stretched dag? Then maybe. Best case scenario with any Cluster management in Exchange is let the mechanisms manage themselves until it’s absolutely necessary to intervene.