Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes.


Although Exchange 2010 no longer deploys a cluster resource model we still use Windows Failover Clustering service for certain functions.

When a Windows 2008 / 2008 R2 cluster is created, the cluster core resources are groups together in the ‘Cluster Group’.  THe Cluster Group is a hidden group that contains the following resources:

  • Cluster Name:  This is the cluster name object (CNO).  Exchange 2010 uses the name of the DAG to create this resource.  The name of the DAG is always the name of the cluster and the CNO.
  • Cluster IPv4 Addresses:  These are the IPv4 addresses that are associated with the DAG.  If the members of the DAG span multiple subnets, there will be multiple IPv4 resources.
  • File Share Witness:  This is the quorum resource that is created using the witness server and witness directory settings of the DAG.  This resource should only be present when there is an even number of DAG members.

You can see the cluster core resources in failover cluster manager by selecting the cluster name in the upper left hand pane.  In the center pane, expand the cluster core resources section.

image

The cluster core resource group can also be seen using cluster.exe (or in Windows 2008 R2 cluster powershell extensions).

Windows 2008 / Windows 2008 R2:  Cluster.exe DAG.company.com group

cluster.exe dag.company.com group
Listing status for all available resource groups:

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-1           Online
Available Storage    DAG-1           Offline

Windows 2008 R2:  Get-ClusterGroup –Cluster DAG.company.com

PS C:\Users\Administrator> Get-ClusterGroup -Cluster DAG.company.com

Name                   OwnerNode        State
—-                   ———        —–
Cluster Group          dag-1           Online
Available Storage      dag-1          Offline

From an Exchange 2010 perspective you do not really need to manage the cluster core resources.  As members join and depart the cluster this resource group will be automatically moved to a remaining member.  Each member of the DAG should have the ability to arbitrate and fully bring online the cluster core resources.

When a cluster is created in Windows 2008 or Windows 2008 R2, the cluster service enumerates all network ports found on the nodes.  These network ports are then combined into cluster networks.  You can view the cluster networks in failover cluster manager by expanding the cluster name and expanding networks.

image

You can also view the cluster networks using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network

cluster.exe dag.company.com network
Listing status for all available networks:

Network                                  Status
—————————————- ———–
Cluster Network 2                        Up
Cluster Network 4                        Up
Cluster Network 1                        Up

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com

Get-ClusterNetwork -Cluster DAG.company.com

Name                                State
—-                                —–
Cluster Network 1                   Up
Cluster Network 2                   Up
Cluster Network 4                   Up

A cluster network has three settings:

  • Do not allow cluster network communications on this network
  • Allow cluster network communications on this network
    • Allow clients to connect through this network

You can see these settings in failover cluster manager by getting the properties of a cluster network.

image

You can also view the network role either by using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network "Cluster Network 1” /prop

cluster dag.company.com network "Cluster Network 1" /prop

Listing properties for ‘Cluster Network 1’:

T  Network              Name                           Value
— ——————– —————————— ———–
SR Cluster Network 1    Name                           Cluster Network 1
MR Cluster Network 1    IPv6Addresses
MR Cluster Network 1    IPv6PrefixLengths
MR Cluster Network 1    IPv4Addresses                  10.0.0.0
MR Cluster Network 1    IPv4PrefixLengths              24
SR Cluster Network 1    Address                        10.0.0.0
SR Cluster Network 1    AddressMask                    255.255.255.0
S  Cluster Network 1    Description
D  Cluster Network 1    Role                           3 (0x3)
D  Cluster Network 1    Metric                         1200 (0x4b0)
D  Cluster Network 1    AutoMetric                     1 (0x1)

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com | fl name,role

Get-ClusterNetwork -Cluster DAG-1.company.com | fl name,role

Name : Cluster Network 1
Role : 3

Name : Cluster Network 2
Role : 1

Name : Cluster Network 4
Role : 1

The role of the networks can also be viewed in the registry of each node.  This information is located at:  HKEY_LOCAL_MACHINE\Cluster\Networks.  Each cluster network is represented by a subkey which is the GUID of the network.  Expanding the GUID, you will see sub-values including Name and Role.

[HKEY_LOCAL_MACHINE\Cluster\Networks\2cd2b920-0a2a-4851-bb24-de02d4a70b7e]
@="class mscs::TmNetworkInfo"
"Id"="2cd2b920-0a2a-4851-bb24-de02d4a70b7e"
"Name"="Cluster Network 2"
"Signature"="NETW"
"Description"=""
"Role"=dword:00000001
"Priority"=dword:ffffffff
"Transport"="TCP/IP"
"Ignore"=dword:00000000
"Address"="192.168.0.0"
"AddressMask"="255.255.255.0"
"IPv6Address"=""
"State"=dword:00000003
"Metric"=dword:0000044c
"AutoMetric"=dword:00000001

The role value can contain three different values depending on the cluster network settings.  The values are:

  • 0:  Do not allow cluster network communications on this network
  • 1:  Allow cluster network communications on this network
  • 3:  Allow clients to connect through this network

In order for an IPv4 resource to be brought online it must be associated with a network that  is configured to “Allow cluster network communications on this network” and to “Allow clients to connect through this network”.  If for any reason the “Allow clients to connect through this network” option is not enabled, the IPv4 resource associated with that network will not be able to be brought online.

On an Exchange 2010 DAG member, when attempting to move the cluster core resources to another DAG member the resources may fail to come online.  Specifically the IPv4 resource fails to come online which results in the network name resource failing to come online (due to dependency).

If using Failover Cluster Manager and attempting to bring online the IPv4 resource in the cluster core resources group, the following pop up error is displayed:

image

A review of the system log shows event 1223:

Log Name:      System

Source:        Microsoft-Windows-FailoverClustering

Date:          5/10/2010 1:14:42 PM

Event ID:      1223

Task Category: IP Address Resource

Level:         Error

Keywords:     

User:          SYSTEM

Computer:     dagNode.company.com

Description:

Cluster IP address resource ‘IPv4 Static Address 2 (Cluster Group)’ cannot be brought online because the cluster network ‘Cluster Network 2’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

This Event 1223, described above, indicates that the effective setting for Cluster Network 2 is “Allow cluster network communications on this network” but does not have “Allow clients to connect through this network” set.  However, when reviewing the settings in failover cluster manager for Cluster Network 2 you might see that both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled. 

The Microsoft Exchange Replication Service is responsible for assisting to maintain the cluster network configuration.  There is an issue in the current Replication Service where settings are not changed.  This essentially causes a difference between the setting inside the cluster and the setting displayed in Failover Cluster Management tools.

Workaround:

A quick and easy workaround for this issue is to simply reset the state of the network.  There are multiple ways to accomplish this and I will outline each below.  Step zero before proceeding with any other steps is to note the cluster network that is displayed in the above event since that is the network that will need to be reset (in this example Cluster Network 2). 

Windows 2008 / Windows 2008 R2 – Using Failover Cluster Management Tool

The network state can be reset using Failover Cluster Manager

  • Launch Failover Cluster Management
  • Expand the cluster \ networks.

image

  • Get the properties of the cluster network in question.
  • Uncheck the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.
  • The network is disabled for “Allow clients to connect through this network”. 

Next we need to enable the network for “Allow clients to connect through this network”.

  • Get the properties of the cluster network.
  • Check the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.

The network has been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 / Windows 2008 R2:  Using cluster.exe

  • Launch a command prompt with administrative privileges.
  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=1

  • The network is disabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=3

  • The network is enabled for “Allow clients to connect through this network”.  At this time we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 R2:  Using powershell

  • Launch powershell with administrative privileges.
  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=1}

  • The network is disabled for “Allow clients to connect through this network”. 

Next, enable the network for “Allow clients to connect through this network”.

  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=3}

  • The network is enabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

 

LONG TERM FIX

This issue will be fixed in Exchange 2010 Service Pack 1.  The issue will not be fixed in Exchange 2010 RTM.

==========================================

Updated – 6/2/2010

Updated to list Exchange 2010 SP1 confirmed to contain fix. 

==========================================


Comments (38)

  1. Anonymous says:

    @Gaz:

    Can you provide some more information for me.  This is not something that you can reproduce.  If you manually change the cluster settings you can force this issue to occur but it is not covered under the fix described in this blog post.  Are you saying that you have experienced this issue on an SP1 DAG?

    TIMMCMIC

  2. Anonymous says:

    @GAZ

    Are you sure your issue is not fixed by SP1.  We have had no reported cases of any issues post SP1 when the instructions are followed.  As a reminder, if you had the issue pre-SP1 SP1 alone does not fix the issue, the workaround must be followed.  The issue is simply prevented from re-occuring.

    Also – just becuase you have a resource offline does not mean you have this issue – there are multiple reasons a resource maybe reporting offline.

    TIMMCMIC

  3. Anonymous says:

    @Tim

    Unfortunately no, I had already installed EX2010-SP1 when I installed BE2010 which is to be my last step before moving real mailboxes to the EX2010 server.

    McCue

  4. Anonymous says:

    I have a two node Exchange SP1 DAG, and have just noticed this error. My two servers are clean Exchange SP1 installs.

    Whilst all appears to be OK with the DAG, seeing the "resource offline" error in the Failover Cluster Manager is a little disconcerting.

    I attempted to amend the settings for "Cluster Network 2" and check the “Allow clients to connect through this network” box but each time I apply the setting and then go back into the netowrk properties, the box is unticked again.

    The powershell and cluster.exe commands didn't work for me either.

    I have managed to bring the cluster online however, by manually changing the "Role"=dword:00000001 in the registry (as described above) to "Role"=dword:00000003.

    After doing this, I have managed to bring the resource online successfully.

  5. Anonymous says:

    @Phil:

    I am not aware of what BE2010 is.

    TIMMCMIC

  6. Anonymous says:

    @Tim,

    We're still encountering this problem numerous times in SP2 with Update Rollup 4. We've to manually bring the cluster online!

  7. Anonymous says:

    @James:

    We do not expect this to fix any underlying cluster network communications problems.

    TIMMCMIC

  8. Anonymous says:

    @mfahey

    Assuming you actually have this issue if you followed the workaround then you should have been fine.  Take a look at my collapsing DAG networks blog post as this could be another reason your cluster networks are not maintained correctly.

    TIMMCMIC

  9. Anonymous says:

    @McCue

    Thanks for posting.  Can you confirm whether or not the issue was present prior to upgrading to SP1?

    TIMMCMIC

  10. Anonymous says:

    Tim,

    This is still a problem with Service pack 1.  Today I had to go to the failover cluster manager and remove the check, click apply, then add the check and click apply and finally I can bring the DAG online to both ping it and use Backup Exec to select the DAG.  

    McCue

  11. Anonymous says:

    @Joe:

    This is correct but you should not have to reset the IP address to correct the issue outlined in this blog.

    TIMMCMIC

  12. Anonymous says:

    @Sadda:

    If you can bring the cluster core resources online then this is not your issue.  This issue would prevent you from manually brining the cluster core resources online.

    I would suggest reviewing the application and system logs for events regarding the cluster core resources.

    TIMMCMIC

  13. Anonymous says:

    The solutions above do not work. I do not have SP1 installed yet.

    When bringing the IP ADDRESS online I get error code: 0x80071737

    When bringing the dag name online I get: Error code: 0x80071736 The resource failed to com online due to the filure of one or more provider resources.

    Any other fixes for this?

  14. Anonymous says:

    @curropar…

    I can assure you that in the specific instance cited here this was an exchange issue. If you have had this issue outside of Exchange it would be caused by other factors.

    TIMMCMIC

  15. Joerg says:

    Can anyone confirm this has 100% fixed in e2010sp1?

  16. Gaz says:

    This is not fixed in SP1. I have tested this myself.

  17. Rene says:

    This is an issue before SP1 as well.  I have Exchange 2010 and BE2010 R2, and cluster was offline after a reboot.

  18. Joe says:

    To set an IP address for the DAG, use the following exchange shell command:

    Set-DatabaseAvailabilityGroup -identity DAGGroupName -databaseavailabilitygroupipaddress 192.168.x.x

    Confirm with Get-DatabaseAvailabilityGroup -Idenity DagGroupName |fl

  19. Phil says:

    Is this related to BE2010? I had this same issue with Exchange 2010 (not SP1) but not until after BE2010 was installed.

    Thanks for the fix !

  20. McCue says:

    @Tim

    I'm pretty sure BE2010 is Backup Exec 2010

  21. Mauro Rita says:

    With Exchange 2010 SP1 UR0 this is not fixed.

  22. @Mauro Rita says:

    If the issue existed prior to upgrading then you will have to follow the workaround.  SP1 will prevent the issue from reoccuring.

    TIMMCMIC

  23. Jim Mangan says:

    I have run the fix a number of times after installing SP1 for Exchange 2010 and one of my two DAG addresses are reporting as offline. Is there any other fix available?

  24. Gaz says:

    So, again, another issue that is meant to be fixed by SP1, isnt…………………..

    Currently I have a DAG member that is reporting as offline, failed……        

    Dont msft test anything anymore?????

  25. BlackCat says:

    I had the same experience as @MattP_75, I had to change the role in registry: http://lokna.no/?p=998

  26. Corbett says:

    I had the same problem where the cluster IP Address wouldn't come online.  I tried the workaround but it didn't seem to make a difference.  It wasn't until I changed the setting to "Do not allow cluster network communication on this network" that the cluster was able to come online.  After changing that setting it must have reset the network because it immediately went to an online state and the setting was reverted back to "Allow cluster network communication on this network" and "Allow clients to connect through this network".

  27. Corbett says:

    Oh, and this was with Exchange 2010 SP2.

  28. jash says:

    I also have the same issue. I have exchange 2010 Sp3 RU1 on the server. when i tried to select the "Allow clients to connect through this network"  option and click ok.

    after few second it again deselect the option automatically. I dont know why this is misbehaving.

  29. James says:

    My exchange 2010 SP1 dag went down on one network only, saying it could not comunicate on this network. "•Uncheck the box to “Allow clients to connect through this network”. and recheck worked for me.

  30. James says:

    Also fixed partitioned network and event ID's 1129, 1126 and 1564

  31. Rahamat says:

    I have Exchange 2010 SP3 RU1 with DAG and see the same issue where the resource would not come online. Any fix for this?

  32. Deepak says:

    This is not fixed for me. we’re at exchange 2010 SP2 and its the same problem

  33. Mike says:

    Or service pack 3 rollup 5

  34. Junaid says:

    I have faced same issue but i am using Exchange 2013 SP1.

  35. prasoon says:

    worked fine for me exchange 2010 without sp

  36. curropar says:

    Hi, this has been an issue for me: although it’s not an Exchange, just a file server, it failed in the same way, with the same error code and the same events on the log. It’s Windows 2008 R2 Enterprise SP1 x64. I didn’t have the time to look for a solution
    on the internet (it’s a cluster, it’s supposed to be HA!) , so I’d to delete the File Server Witness roles, created them again and create the shares (it’s more than 120 shares!). Luckily, I did an export of all shares the week before!

    So basically, I meant this is not an Exchange issue, but Windows Server 2008 issue. Don’t expect this to be solved by any patch or SP for Exchange.

  37. Roeland says:

    I’m also having this issue in Exchange 2013 CU7 on Server 2012 R2. I can manually change the Role dword valuein the registry from 1 to 3 and bring the cluster resources online. However, the following day Exchange appears to have reverted this setting…
    Still looking for a fix.

  38. TIMMCMIC says:

    @Roeland…

    This would indicate that the DAG networks are not collapsed correctly you do not have the correct flags on the DAG networks.

    This assumes that you are not trying to set a secondary network to allow cluster IP addresses as that will never work.

    TIMMCMIC