Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes.

Although Exchange 2010 no longer deploys a cluster resource model we still use Windows Failover Clustering service for certain functions.

When a Windows 2008 / 2008 R2 cluster is created, the cluster core resources are groups together in the ‘Cluster Group’.  THe Cluster Group is a hidden group that contains the following resources:

  • Cluster Name:  This is the cluster name object (CNO).  Exchange 2010 uses the name of the DAG to create this resource.  The name of the DAG is always the name of the cluster and the CNO.
  • Cluster IPv4 Addresses:  These are the IPv4 addresses that are associated with the DAG.  If the members of the DAG span multiple subnets, there will be multiple IPv4 resources.
  • File Share Witness:  This is the quorum resource that is created using the witness server and witness directory settings of the DAG.  This resource should only be present when there is an even number of DAG members.

You can see the cluster core resources in failover cluster manager by selecting the cluster name in the upper left hand pane.  In the center pane, expand the cluster core resources section.

image

The cluster core resource group can also be seen using cluster.exe (or in Windows 2008 R2 cluster powershell extensions).

Windows 2008 / Windows 2008 R2:  Cluster.exe DAG.company.com group

cluster.exe dag.company.com group
Listing status for all available resource groups:

Group Node Status
-------------------- --------------- ------
Cluster Group DAG-1 Online
Available Storage DAG-1 Offline

Windows 2008 R2:  Get-ClusterGroup –Cluster DAG.company.com

PS C:\Users\Administrator> Get-ClusterGroup -Cluster DAG.company.com

Name OwnerNode State
---- --------- -----
Cluster Group dag-1 Online
Available Storage dag-1 Offline

From an Exchange 2010 perspective you do not really need to manage the cluster core resources.  As members join and depart the cluster this resource group will be automatically moved to a remaining member.  Each member of the DAG should have the ability to arbitrate and fully bring online the cluster core resources.

When a cluster is created in Windows 2008 or Windows 2008 R2, the cluster service enumerates all network ports found on the nodes.  These network ports are then combined into cluster networks.  You can view the cluster networks in failover cluster manager by expanding the cluster name and expanding networks.

image

You can also view the cluster networks using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network

cluster.exe dag.company.com network
Listing status for all available networks:

Network Status
---------------------------------------- -----------
Cluster Network 2 Up
Cluster Network 4 Up
Cluster Network 1 Up

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com

Get-ClusterNetwork -Cluster DAG.company.com

Name State
---- -----
Cluster Network 1 Up
Cluster Network 2 Up
Cluster Network 4 Up

A cluster network has three settings:

  • Do not allow cluster network communications on this network
  • Allow cluster network communications on this network
    • Allow clients to connect through this network

You can see these settings in failover cluster manager by getting the properties of a cluster network.

image

You can also view the network role either by using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network "Cluster Network 1” /prop

cluster dag.company.com network "Cluster Network 1" /prop

Listing properties for 'Cluster Network 1':

T Network Name Value
-- -------------------- ------------------------------ -----------
SR Cluster Network 1 Name Cluster Network 1
MR Cluster Network 1 IPv6Addresses
MR Cluster Network 1 IPv6PrefixLengths
MR Cluster Network 1 IPv4Addresses 10.0.0.0
MR Cluster Network 1 IPv4PrefixLengths 24
SR Cluster Network 1 Address 10.0.0.0
SR Cluster Network 1 AddressMask 255.255.255.0
S Cluster Network 1 Description
D Cluster Network 1 Role 3 (0x3)
D Cluster Network 1 Metric 1200 (0x4b0)
D Cluster Network 1 AutoMetric 1 (0x1)

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com | fl name,role

Get-ClusterNetwork -Cluster DAG-1.company.com | fl name,role

Name : Cluster Network 1
Role : 3

Name : Cluster Network 2
Role : 1

Name : Cluster Network 4
Role : 1

The role of the networks can also be viewed in the registry of each node.  This information is located at:  HKEY_LOCAL_MACHINE\Cluster\Networks.  Each cluster network is represented by a subkey which is the GUID of the network.  Expanding the GUID, you will see sub-values including Name and Role.

[HKEY_LOCAL_MACHINE\Cluster\Networks\2cd2b920-0a2a-4851-bb24-de02d4a70b7e]
@="class mscs::TmNetworkInfo"
"Id"="2cd2b920-0a2a-4851-bb24-de02d4a70b7e"
"Name"="Cluster Network 2"
"Signature"="NETW"
"Description"=""
"Role"=dword:00000001
"Priority"=dword:ffffffff
"Transport"="TCP/IP"
"Ignore"=dword:00000000
"Address"="192.168.0.0"
"AddressMask"="255.255.255.0"
"IPv6Address"=""
"State"=dword:00000003
"Metric"=dword:0000044c
"AutoMetric"=dword:00000001

The role value can contain three different values depending on the cluster network settings.  The values are:

  • 0:  Do not allow cluster network communications on this network
  • 1:  Allow cluster network communications on this network
  • 3:  Allow clients to connect through this network

In order for an IPv4 resource to be brought online it must be associated with a network that  is configured to “Allow cluster network communications on this network” and to “Allow clients to connect through this network”.  If for any reason the “Allow clients to connect through this network” option is not enabled, the IPv4 resource associated with that network will not be able to be brought online.

On an Exchange 2010 DAG member, when attempting to move the cluster core resources to another DAG member the resources may fail to come online.  Specifically the IPv4 resource fails to come online which results in the network name resource failing to come online (due to dependency).

If using Failover Cluster Manager and attempting to bring online the IPv4 resource in the cluster core resources group, the following pop up error is displayed:

image

A review of the system log shows event 1223:

Log Name: System

Source: Microsoft-Windows-FailoverClustering

Date: 5/10/2010 1:14:42 PM

Event ID: 1223

Task Category: IP Address Resource

Level: Error

Keywords:     

User: SYSTEM

Computer: dagNode.company.com

Description:

Cluster IP address resource 'IPv4 Static Address 2 (Cluster Group)' cannot be brought online because the cluster network 'Cluster Network 2' is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

This Event 1223, described above, indicates that the effective setting for Cluster Network 2 is “Allow cluster network communications on this network” but does not have “Allow clients to connect through this network” set.  However, when reviewing the settings in failover cluster manager for Cluster Network 2 you might see that both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled. 

The Microsoft Exchange Replication Service is responsible for assisting to maintain the cluster network configuration.  There is an issue in the current Replication Service where settings are not changed.  This essentially causes a difference between the setting inside the cluster and the setting displayed in Failover Cluster Management tools.

Workaround:

A quick and easy workaround for this issue is to simply reset the state of the network.  There are multiple ways to accomplish this and I will outline each below.  Step zero before proceeding with any other steps is to note the cluster network that is displayed in the above event since that is the network that will need to be reset (in this example Cluster Network 2). 

Windows 2008 / Windows 2008 R2 – Using Failover Cluster Management Tool

The network state can be reset using Failover Cluster Manager

  • Launch Failover Cluster Management
  • Expand the cluster \ networks.

image

  • Get the properties of the cluster network in question.
  • Uncheck the box to “Allow clients to connect through this network”.

image

  • Press <apply> - you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.
  • The network is disabled for “Allow clients to connect through this network”. 

Next we need to enable the network for “Allow clients to connect through this network”.

  • Get the properties of the cluster network.
  • Check the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.

The network has been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 / Windows 2008 R2: Using cluster.exe

  • Launch a command prompt with administrative privileges.
  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=1

  • The network is disabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=3

  • The network is enabled for “Allow clients to connect through this network”.  At this time we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 R2: Using powershell

  • Launch powershell with administrative privileges.
  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=1}

  • The network is disabled for “Allow clients to connect through this network”. 

Next, enable the network for “Allow clients to connect through this network”.

  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=3}

  • The network is enabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

 

LONG TERM FIX

This issue will be fixed in Exchange 2010 Service Pack 1. The issue will not be fixed in Exchange 2010 RTM.

==========================================

Updated – 6/2/2010

Updated to list Exchange 2010 SP1 confirmed to contain fix. 

==========================================