DSAccess issues

If you run Exchange, no doubt you know how important Active Directory is.  However, you may not be as familiar with the process that Exchange uses to discover Domain Controllers and Global Catalog servers.  This process is called DSAccess, and is invoked by the Exchange System Attendant every 15 minutes (by default).  It is used to discover which Domain Controllers Exchange should use to send queries to.  DSAccess categorizes Domain Controllers into two categories:  In Site, and Out of Site.  If you guessed that this is related to Active Directory Sites, you would be spot-on.  This also underscores why it is so important to correctly configure Active Directory Sites.  Once Exchange 2007 RTM's, this will be even more critical, because Exchange 2007 relies on Active Directory Sites for it's routing topology.

I'd like to also expand on the importance of something else here.  When you install Exchange, there are a series of permissions that are set on different containers within Active Directory.  Specifically, the Exchange Enterprise Servers group (created by Domainprep) is granted certain permissions to ALL mail-enabled objects within Active Directory.  This is so that Exchange has the permissions to modify (and of course read) mail-enabled objects.  This group is actually granted permissions at the root of the domain, and is inherited down to all subobjects within the domain.  What happens then, if the Exchange Enterprise Servers group can't modify a mail-enabled object?  Let's take a look.

When Exchange tries to modify a mail-enabled object, it does this via an LDAP query to a DC or GC, and it expects to use the Exchange Enterprise Servers group's permissions to do this.  It uses the list of DC/GC's that are obtained by DSAccess (listed in event 2080 in the Application log), and uses the in-site servers first, as they are preferred.  If you have removed inheritance on one or more containers or OU's, such that the Exchange Enterprise Servers group is gone, or if you have removed the Exchange Enterprise Servers Group from the permissions list, then this LDAP query will result in the DC/GC returning an Insufficient Access result (Access Denied).  The key bit here is that when Exchange receives this result from the query, it marks this DC as not available and continues on to another DC in the site.  I was struggling to understand why it would do this, but I believe it does this to protect itself in the event that the DC it is talking to is out of synch.  If all DC's in the site return the same result, it then marks all DC's in the Site as being unavailable, as referenced by Event ID 2084, and it then continues to out of site DC's.  Also worth noting here is that once all in site DC's have been marked as down, the DSAccess refresh time goes down from 15 minutes to 5 minutes.  You can probably guess what happens when the out of site DC's return the same result.  They also are marked as being down.  This will then ultimately result in an Error being logged in the Application log of Event 2102, which states that "All Domain Controllers in use are not responding".  Yikes!  That's not a good error to see.  These errors and other events are logged every 30 minutes.

Other things that could cause this include forcefully removing a DC and not cleaning up the AD metabase afterwards, or having the other DC's pointing to the old (removed) server as a DNS server.  Also, if the DNS server that Exchange is pointing to does not contain the appropriate SRV records, Exchange will not be able to find the DC's and GC's, which will definitely result in issues such as stores not mounting, and even the System Attendant service not starting.