IsAlive function on Exchange 200x clusters

I so wish I was able to go to Miami this year... Can you even imagine a better 4 day lineup than this: https://www.clubspace.com/ ?!

But back to the topic at hand. First, let me explain IsAlive a bit, then I'll touch on how Exchange does it.

In Windows “MSCS” clusters, one of the big things it does is monitor the resources you've defined within the cluster. Take Exchange completely out of the picture, and think of some non-specific service running on the server. Part of making this service cluster-enabled is providing a DLL to the cluster service that exposes some interfaces to cluster (see this msdn page for more detailed info on how to do this). These interfaces are used to control the resources (start and stop, etc) as well as ensure that they are still functioning properly (IsAlive, LooksAlive). If they're not functioning when the IsAlive is called, for instance, the typical response is to FAIL the resource which most likely will cause the entire group to failover to another node. A built-in example of the IsAlive behavior is the “generic service” type, provided in the box with MSCS clustering. It provides these interfaces, although they are quite simple -- in the case of the IsAlive interface, it simply queries the Service Control Manager (SCM) to see if SCM believes the service is in a “Started” state.

How often the cluster calls this IsAlive API can be controlled through the Cluster Admin GUI. For Exchange 200x resources, however, this does not REALLY control how often the resources have their status checked. In Exchange 200x, when the various Exchange resources are brought online by the cluster service, part of their initialization includes spinning up some threads to do the actual monitoring work. These threads perform their monitoring checks on a hard-coded 10-second interval. For ease, let's call these the “IsAliveMonitor“ checks. The result of these IsAliveMonitor determinations are stored in a value. The IsAlive interface presented to the cluster resource monitor does little more than inspect this value to see what the most recent IsAliveMonitor result was. If the IsAliveMonitor thread detects a failure of the resource, it goes ahead and notifies the cluster resource monitor, rather than waiting for the cluster's IsAlive time window to come back around and check.

Here's a great depth-article on the Exchange 2000 resource DLL and how the interfaces are implemented.