The Cluster Service service terminated with service-specific error 183 (0xB7)

I worked on an issue recently where the cluster service was stopped on both nodes of a Windows Server 2008 Failover Cluster. If you attempt to start the Cluster service, it would terminate and log this error in the System Event Log:

Log Name:      System
Source:        Service Control Manager
Date:          8/26/2010 3:52:50 PM
Event ID:      7024
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      node1.contoso.com
Description:
The Cluster Service service terminated with service-specific error 183 (0xB7).

To troubleshoot this issue, the first step is to determine what error 183 stands for. We can do this using the err.exe tool:

C:\Users\contosouser>err 183 

  ERROR_ALREADY_EXISTS                                          
# Cannot create a file when that file already exists.

Ok, so we know what error 183 stands for, but how does this apply to us?

As with any Cluster issue, the best place to get a good idea of what went wrong is the Cluster.log file. Starting with Windows Server 2008, to get log entries corresponding to the latest activity, you’ve got to run Cluster log /gen before accessing the C:\Windows\Cluster\Reports\Cluster.log file.

Pick one node to work on and leave the other node(s) untouched.

In our case, since we only need the last few minutes of activity, attempt to start the Cluster service one more time. Then, from an elevated command prompt, run this command:

Cluster log /gen /span:10

This generates a Cluster.log file that has only the last 10 minutes of activity. Smaller file = Easier to sift through.

Now open the C:\Windows\Cluster\Reports\Cluster.log file and scroll to the bottom of the file.

Search, bottom to top, for AlreadyExists(183). Most likely, you will see one of these two sets of entries:

Scenario 1:

00000ad0.000008d4::2010/08/26-11:40:19.320 INFO  [CORE] Node 1: Calling form for Topology Manager
00000ad0.000008d4::2010/08/26-11:40:19.323 ERR   [CORE] Node 1: exception caught during form AlreadyExists(183)' because of 'already exists'(CLUS2K8)
00000ad0.000008d4::2010/08/26-11:40:19.324 ERR   Form failed (status = 183)

OR

Scenario 2:

00001408.00000358::2011/01/11-22:33:39.001 ERR   [GUM] Node 4: Local Execution of a gum request /tm/gum/set-state resulted in exception AlreadyExists(183)' because of 'already exists'(clus2k8node2 - Local Area Connection)
00001408.00000358::2011/01/11-22:33:39.001 ERR   [CORE] Node 4: exception caught AlreadyExists(183)' because of 'already exists'(clus2k8node2 - Local Area Connection)
00001408.00000358::2011/01/11-22:33:39.001 ERR   Exception in the PostForm is fatal (status = 183)
00001408.00000358::2011/01/11-22:33:39.001 ERR   Exception in the PostForm is fatal (status = 183), executing OnStop

Troubleshooting Scenario 1:

This Cluster seems to have two entities named CLUS2K8. That’s the reason the Cluster service terminates when you attempt to start it.

We know what’s causing the issue. Now, on to the fix.

Disclaimer: The procedure that follows involves editing the registry and critical Cluster components. Incorrectly modifying either of these can leave you with a unusable system. Please proceed at your own risk.

  1. Open Regedit.exe

  2. Check if HKLM\Cluster is present. If it isn’t, load the CLUSDB hive (C:\Windows\Cluster) under HKLM and enter Cluster as the Key Name

  3. Click the Cluster key and with it highlighted, search for CLUS2K8

  4. Make a note of the result. In our case, this was the Cluster name. 

    Blog1

  5. Press F3 to continue the search.

  6. The next result is our conflicting entry, which, in our case, was the name of the Cluster Network.  

    Blog2

  7. Press F3 again, to confirm that there aren’t any more instances of CLUS2K8, under the Cluster key. (See Note below if you find another entry)

  8. Change the name of the Cluster Network to Cluster Network 1 by modifying the Name registrysetting, to remove the conflict.

  9. Unload the CLUSDB registry hive and exit Regedit.exe

  10. Browse to C:\Windows\Cluster and select CLUSDB.blf and all files named CLUSDB.x.container. (x can be 0,1,2…)

  11. Move(Cut and Paste) these files to another location as backup.

  12. Start the Cluster service.

 At this point, you’ve got the Cluster service running on one node and your Cluster is back up and running. Start the Cluster service on the remaining nodes and that’s all that needs to be done. The changes made on the first node will be replicated to other joining nodes.

Note: The conflict, in this instance of the issue, was between the Cluster Name and the name of the Cluster Network. The name of the Cluster Network, which we changed, is only for display purposes and changing it did not have any repercussions. If the conflicting entry is not the name of the Cluster Network, this may not be the case. Exercise extreme caution while modifying anything in the Cluster hive because you don’t want to end up breaking a pre-existing dependency/relationship of a particular component in the Cluster hive with another.

Troubleshooting Scenario 2:

00001408.00000358::2011/01/11-22:33:39.001 ERR   [GUM] Node 4: Local Execution of a gum request /tm/gum/set-state resulted in exception AlreadyExists(183)' because of 'already exists'(clus2k8node2 - Local Area Connection)
00001408.00000358::2011/01/11-22:33:39.001 ERR   [CORE] Node 4: exception caught AlreadyExists(183)' because of 'already exists'(clus2k8node2 - Local Area Connection)
00001408.00000358::2011/01/11-22:33:39.001 ERR   Exception in the PostForm is fatal (status = 183)
00001408.00000358::2011/01/11-22:33:39.001 ERR   Exception in the PostForm is fatal (status = 183), executing OnStop

This error occurs due to the change of case in the node's name. From the cluster log entries in this instance of the issue, it appears that clus2k8node2 has now changed to CLUS2K8NODE2. You can verify this from the Failover Cluster Manager, under 'Nodes'. You will very likely see the node name in a case that is the opposite of what you're seeing in the cluster log entry.

Here's how you can resolve this issue.

  1. Open ncpa.cpl
  2. Rename all network interfaces by appending a random character. For example, 'Local Area Connection 3' can be renamed 'Local Area Connection 3X'; 'Heartbeat' can be renamed 'HeartbeatY'
  3. Start the Cluster service. It will now start up successfully.
  4. Stop the Cluster service and rename all the network interfaces to their original names.
  5. Start the Cluster service again.

Renaming the network interfaces and then starting the Cluster service causes it to update the Cluster hive with the correct case of the server name prefixed to the network interface name. We then stop and revert the name changes to ensure that, essentially, we've not made any changes to the Cluster.

Thanks to Steven Andress, Senior Support Escalation Engineer at Microsoft, for the scenario 2 steps.