Exchange 2010: Cluster core resources, the replication service, and active manager…


Every Exchange 2010 server has a process internal to the replication service known as Active Manager.  The Active Manager is responsible for all database mount, dismount, and move operations that occur in Exchange 2010.

When a server is a standalone server, Active Manager is configured as a Standalone Active Manager. 

When a server is a member of a Database Availability Group (DAG), Active Manager is either configured as:

  • PAM – Primary Active Manager
  • SAM – Secondary Active Manager

The Active Manager status in a DAG is determined by the node that owns the cluster core resources.  If a node owns the cluster core resources group, this node is then known as the Primary Active Manager (PAM).  All other nodes successfully participating in the cluster and not owning the cluster core resources are Secondary Active Managers.

Let’s take a look at an example database availability group.

DAGName:  DAG

DagMembers:  DAG-1,DAG-2,DAG-3,DAG-4

Running get-databaseavailabilitygroup –identity DAG –status | fl name,primaryActiveManager you can determine which machine currently owns the cluster core resources and is acting as the PAM.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-3

Using cluster.exe we can also confirm the owner of the cluster core resources group

cluster.exe DAG.domain.com group

Group                Node            Status
——————– ————— ——
cluster group        DAG-3           Online

Using the cluster command line, the cluster core resources can be moved to another DAG member and the PAM will subsequently change.

cluster.exe DAG.domain.com group "cluster group" /moveto:DAG-4

Moving resource group ‘cluster group’…

Group                Node            Status
——————– ————— ——
cluster group        DAG-4           Online

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-4

Remember that Active Manager runs inside the Microsoft Exchange Replication service which is installed on every Exchange 2010 Mailbox Role Server.  This is important – if the replication service on a DAG member is not started, but that DAG member owns the cluster core resources, database mount / dismount / move functionality will not function.

Here is an example…

Currently the cluster core resources are owned on the node DAG-4 which is successfully participating in the cluster DAG.  Using the services control panel the Microsoft Exchange Replication service on the server DAG-4 was stopped.  We can confirm using the commands above that DAG-4 is still seen as the PAM.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-4

cluster dag.domain.com group
Listing status for all available resource groups:

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-4           Online
Available Storage    DAG-1           Offline

Using test-replicationHealth and test-serviceHealth we can see that the replication service on node DAG-4 is unavailable.

Server          Check                      Result     Error      
——          —–                      ——     —–   

DAG-4           ClusterService             Passed  
DAG-4           ReplayService              *FAILED*   The Microsoft Exchange Replication service is not running on s…
DAG-4           DagMembersUp               Passed
          

Role                    : Mailbox Server Role
RequiredServicesRunning : False
ServicesRunning         : {IISAdmin, MSExchangeADTopology, MSExchangeIS, MSExchangeMailboxAssistants, MSExchangeMailSubmission, MSExchangeRPC, MSExchangeSA, MSExchangeSearch, MSExchangeServiceHost, MSExchangeThrottling, MSExchangeTransportLogSearch, W3Svc, WinRM}
ServicesNotRunning      : {MSExchangeRepl}

At this time a dismount operation on a database was issuing using the dismount-database command.  An error is immediately returned:

Dismount-Database DAG-DB0

Confirm
Are you sure you want to perform this action?
Dismounting database "DAG-DB0". This may result in reduced availability for mailboxes in the database.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y

Couldn’t dismount the database that you specified. Specified database: DAG-DB0; Error code: An Active Manager operation
failed. Error: The Microsoft Exchange Replication service may not be running on server DAG-4.domain.com. Specific RPC error message: Error 0x6d9 (There are no more endpoints available from the endpoint mapper) from cli_MountDatabase.
    + CategoryInfo          : InvalidOperation: (DAG-DB0:ADObjectId) [Dismount-Database], InvalidOperationException
    + FullyQualifiedErrorId : D64CA7E2,Microsoft.Exchange.Management.SystemConfigurationTasks.DismountDatabase

 

This error is the occurs because the server that is designated as the Primary Active Manager does not have it’s replication service running (and therefore the Active Manager is not running).  Stopping the replication service does not automatically arbitrate Active Manager functions to another DAG member.

To fix this error:

  • Start the replication service on the machine that is designated as the Primary Active Manager (preferred).
  • Move the cluster core resources to another DAG member (promoting that server to the Primary Active Manager.  (Least preferred since it does not address why the replication service is stopped on a running DAG member).

It is important that the replication service be monitored on all DAG members to ensure it remains functional.

*Updated – 5/30/2010 – Corrected the commandlet for testing services –> test-serviceHealth instead of test-serverHealth.

*Updated – 6/22/2011 – Corrected table formatting of output.


Comments (28)

  1. TIMMCMIC says:

    @Turbomcp

    Article updated.

    TIMMCMIC

  2. TIMMCMIC says:

    @JFM

    When the node that owns the cluster core resources fails, the cluster service automatically arbitrates them over to another node thereby promoting the node to be the PAM.

    TIMMCMIC

  3. TIMMCMIC says:

    @Justin:

    Apologize for the delay in responding.

    When it comes to the PAM we are actually talking about the group in cluster called the "Cluster Group".

    By default when you reboot the node that owns the cluster group cluster moves it to another node automatically.  Should you want to move the group prior you can through two methods:

    Windows 2008:

    Cluster DAGNAME.fqdn group "Cluster Group" /moveto:NODENAME

    Windows 2008 / Windows 2008 R2:

    Open PowerShell

    Import-Module FailoverClusters

    Move-ClusterGroup -name "Cluster Group" -node NODENAME -cluster DAGNAME

    TIMMCMIC

  4. TIMMCMIC says:

    @Greg:

    It sounds like you do not have automountconsensus and possibly have DAC enabled.  See my blog series on DAC if that's the case and if not post back.

    TIMMCMIC

  5. TIMMCMIC says:

    @Sureshbabu…

    Simply put quorum is V/2+1 (where V is the number of votes in a cluster). If you do not have the correct number of votes immediately available, then you do not have quorum.

    TIMMCMIC

  6. TIMMCMIC says:

    @ Hi all….

    It appears your comment did not get completely posted.  Let me know how i can assist.

    TIMMCMIC

  7. TIMMCMIC says:

    @LMK:

    The PAM determination when DAC is enabled is a combination of two things.

    1) Does the cluster have quorum?
    2) Are the rules of DAC met?

    In the case you describe, when the first site comes back up the cluster has quorum. Fortunately the rules of DAC were not met – which means that no PAM can be promoted.

    All servers in the primary site enter an unknown state for active manager.

    TIMMCMIC

  8. TIMMCMIC says:

    @Monika:

    No problem.

    TIMMCMIC

  9. TIMMCMIC says:

    @Pankaj…

    Thanks

    TIMMCMIC

  10. TIMMCMIC says:

    @Mosh:

    No – no downtime is required when arbitrating the PAM between nodes.

    TIMMCMIC

  11. TIMMCMIC says:

    @Erik Bo:

    Great question.  So essentially active manager that runs within the replication service controls a lot of stuff whether a  DAG is involved or standalone.

    Active manager will on a standalone server control;

    Database mount

    Databaes dismount

    Active manager on a DAG will control:

    Database mount

    Database dismount

    Database autodismount

    Database move

    Essentially when you issue a mount request the request is sent to active manager, active manager checks certain things and then issues the request to the IS – this is an example from a standalone server.

    Hope that helps.

    TIMMCMIC

  12. Anonymous says:

    I have a question, my DAG sometimes switches over the PAM to DR site! this is really strange behavior, it should move the PAM Role to any dag node in the same site, right?? then I have to move the PAM manually by the command! could you please advice?

  13. LMK says:

    If you have 2 sites and one site goes dark (including the Domain Controllers and DAG servers are down) and therefore the DAG loses quorum – I assume since the DAG has to be reestablished in the remaining site, that the PAM will be reassigned as appropriate
    to a remain DAG member. Furthermore, on the switchback, if one of the failed DAG members had the PAM role, the code is smart enough to detect that their is a new/existing PAM.

    Does this make sense?

  14. TIMMCMIC says:

    @JFM…

    There are very few reasons that are legitimate for worrying about the owner of the cluster core resources (PAM) and this is not one of them.

    Whenever the PAM role changes between servers the PAM reviews the mount status of each database to ensure that no move actions were in process and that all is well across the DAG.  In this instance the PAM would detect that the databases were / are owned on a node that is no longer valid (since the cluster service is non-functional) and would begin the best copy / move process to another node.

    If it was required to worry about where the PAM was owned in this specific instance you could see how a single point of failure would be introduced – which would not be good.

    TIMMCMIC

  15. turbomcp says:

    Hi

    great article as always:)

    just small typo in test-serverHealth  should be Test-ServiceHealth here:

    Using test-replicationHealth and test-serverHealth we can see that the replication service on node DAG-4 is unavailable.

    Thanks again for all your efforts bringing interesting stuff every week/day

  16. turbomcp says:

    Hi

    great article as always:)

    just small typo in test-serverHealth  should be Test-ServiceHealth here:

    Using test-replicationHealth and test-serverHealth we can see that the replication service on node DAG-4 is unavailable.

    Thanks again for all your efforts bringing interesting stuff every week/day

  17. Monika says:

    Thanks for posting about Active Manager.

  18. Erik Bo says:

    Yeah,

    Great article – thanks!

    Just 1 question:

    What business does the Replication Service (and the Standalone Active Manager) undertake in a standalone Exchange Server configuration?

    Kind regards

  19. Hi all says:

    Good Morning,

    I have one issue.

  20. justin says:

    Awesome article!! One quick question, I'm new to administering Exchange and my PAM is currently on a server that i'd like to reboot. Is it safe to move the PAM role to the other server during production and not experience any sort of outage.

    Thanks,

    Justin

  21. jfm says:

    If the PAM fails, is there a way to force one of the SAM members to become the PAM?

    In case the PAM physically fails without any way to put it back in production fast enough.

    Thanks!

  22. jfm says:

    @TIMMMCMIC

    I currently have a 2 members DAG in production with 1 mailbox database.

    What if the PAM is also hosting the active database?

    Would the cluster service be able to move PAM to the second member and then move the active database to it? Or maybe I should always make sure that the PAM is my second MBX server with the database copy.

    Thank you,

    JFM

  23. pankaj says:

    I had checked its really good article for us…:)

  24. Mosh says:

    Does downtime require while moving PAM manually from one node to another.

  25. greg says:

    I am working on a DR test.  I have brought up one virtual Exchange Server and one virtual domain controller on an offline VM Host.  I moved the cluster resources to the Exchange server but the PAM does not seem to move with it.  When I run anything the PAM is involved in, it tries to retrieve the PAM from a different Exchange server that I don't intend on restoring.

  26. greg says:

    Let me add that when i check the "Cluster Group" resources, they show online and owned by the Exchange server I am restoring to…

  27. Hardik says:

    When the PAM goes offline, another server assumes the role of PAM. Now how are the active databases residing on the prior PAM failover, who takes the failover of those databases.

    Thanks
    Hardik

  28. sureshbabu says:

    Nice article. could you please explain in detail of quorum

Skip to main content