Troubleshooting the Recipient Update Service (RUS) using Event Logs – Part 1


(This is a first part of the series on troubleshooting RUS issues with the use of Diagnostics Logging. Stay tuned for more... :)


Many RUS problems can be identified through careful examination of the application event log. It is useful to use event logging to troubleshoot the RUS for many different behaviors, such as when the RUS is not stamping objects at all, appears to be taking a long time, or is stamping them with the wrong proxy addresses. This article describes how to use the event log to identify these issues.


It is up to the domain RUS's for a given domain to stamp the mail-enabled objects in that domain naming context. Exchange allows you to create up to one domain RUS for every DC in the domain. If a domain has more than one domain RUS, the first step in the troubleshooting process is to choose one particular RUS to troubleshoot


To begin logging the events of interest to the application log, increase logging on the following objects to Maximum on the Exchange server responsible for the domain RUS you've chosen:


        MSExchangeAL\LDAP Operations
        MSExchangeAL\Address List Synchronization
        MSExchangeSA\Proxy Generation (Exchange 2003 only)


Once you've chosen a domain RUS to look at and turned up logging on that Exchange server, the next step is to choose an object to test with, such as a user that has not been stamped yet in that domain. Once you know which RUS you're looking at, and which user you're expecting it to stamp, you can begin taking a closer look at what the RUS is doing.


Repeatedly choosing Rebuild on the RUS or Apply This Policy Now on a policy can complicate the troubleshooting process by causing the RUS to process large numbers of objects. This results in the application log quickly overwriting itself and makes it difficult to follow the sequence of events described above. When troubleshooting the RUS, it is best to avoid Rebuilding or Applying and instead focus on a single test user and use only Update Now to check for new and modified objects. After an Update Now, you can walk through the events described above to understand what the RUS is doing to a particular recipient.


Question 1 - Is the RUS querying for changes?


First you should determine if the RUS is even looking for recipients to process. Based on the schedule, or when Update Now is chosen, the RUS will query the domain for any new or modified objects. The first step in troubleshooting a RUS that's not stamping is to verify that it's checking for changes at all.


In ADSI Edit or LDP, connect to the DC that the RUS points to, and find your test user. Look at the USNChanged attribute and make a note of the value.


Next, open up the application event log on the Exchange server responsible for the RUS. Go to View->Find. For Event ID put 8011. For Description put "Base 'DC" (without the enclosing quotes). This will take you to the most recent search for changes against the domain naming context:


Event Type: Information
Event Source: MSExchangeAL
Event Category: LDAP Operations
Event ID: 8011
Description:
Searching directory bilongexch1.bilong.test at base 'DC=bilong,DC=test' using filter '(&(USNChanged>=273870)(uSNChanged<=298312)((objectclass=*)))' and requesting attributes distinguishedName; objectGUID; LegacyExchangeDN; msExchADCGlobalNames; ObjectSID; ObjectClass; objectCategory; displayName; msExchHideFromAddressLists; hideDLMembership; ntsecuritydescriptor; showInAdvancedViewOnly; msExchALObjectVersion; showInAddressBook; msExchPolicyEnabled; givenName; sn; cn; mailNickname; targetAddress; initials; proxyAddresses; mail; textEncodedORAddress; msExchHomeServerName; msExchExpansionServerName; msExchCustomProxyAddresses; msExchPoliciesIncluded; msExchPoliciesExcluded; replPropertyMetaData; replicatedObjectVersion; ReplicationSignature; WhenChanged; WhenCreated; USNchanged; USNcreated; ObjectVersion; isDeleted; homeMDB; homeMTA; msExchMailboxGuid; msExchMailboxSecurityDescriptor; msExchResourceGUID; UserAccountControl; msExchUserAccountControl.
  


Here you can see the RUS is searching for any objects with a USNChanged value between 273870 and 298312. You may notice that there are many other events 8011 in the application log that contain different searches. These are generated by many different operations. For the purpose of troubleshooting the stamping of users, though, the only 8011 events we are interested in are the ones where the base of the search is the domain in question. This is why it is useful to use Find to skip directly to an 8011 event that contains the text "Base 'DC" - this will take you directly to a search against a domain and skip over the rest of the 8011 events. If you have RUS's for different domains running on the same Exchange server, you may want to include the entire name of the domain in the Description portion of the Find window, so you can skip over any 8011 events for other domain RUS's.



If the user you've identified has a USNChanged value higher than the range of USNs in this event, then the RUS has not yet queried for that user. If the USNChanged on your user is much greater than the USNs currently being processed by the RUS, that's an indication that the RUS has fallen behind and is still catching up to the latest changes. One common reason for this is that a Rebuild was run. When you choose Rebuild, the RUS starts over from a USNChanged of 1 and queries for all objects in the domain. In a large domain, it can take hours or in some cases days for the RUS to process all the objects and catch up after running a Rebuild.


If the user you've identified has a USNChanged value lower than the range of USNs in this event, then this RUS has already passed that user. In that case, continue to search back through the application log until you find the 8011 that contains the range of USNs that includes the user in question. If you can not find it, another option is to make another change to the user. Any change to the object, even just changing the description, will cause the USNChanged to be bumped up to the latest value on the DC. So if the RUS has already gone past the user in question and you can not find the associated 8011, just make a change to the user and note the new USNChanged. Then watch the app log for the next 8011, which should include the USN of the user you just updated.


If there is no 8011 with "Base 'DC" in the description, then the domain RUS has not kicked off since logging was turned up, or if it has then the event wrapped out of the log. If a Rebuild is running it may be difficult to catch the 8011 against the domain root, since the application log will be filling up in a very short time during a Rebuild. See the next section for instructions on how to determine if a Rebuild is running.


If there is no 8011 and it does not appear that a Rebuild is running, check the schedule on the RUS. You can try choosing Update Now to get the RUS to kick off immediately, but do not start a Rebuild or Apply a policy. If you still see no 8011 querying the root of the domain within the next few minutes, it is likely that the RUS is hanging waiting for a response to a search against the DC.


When the RUS is hanging in an LDAP query, you can restart the System Attendant service to get it going again. However, the RUS may hang again unless the cause of the LDAP query hang is identified and corrected. This is usually caused by a network problem, and a Network Monitor capture of the query hanging will be needed to identify the cause.


If the 8011 does contain a range of USNChanged values that includes the USNChanged on the user, then you have answered the first question.


- Bill Long

Comments (8)
  1. Stu Fox says:

    Here’s a question. Why does the RUS fail in an environment where it can’t find a particular address extension, even though the policies it’s trying to apply don’t even refer to that address extension? I’ve seen this in a mixed mode org where one of the 5.5 sites has the Rightfax proxy generator, and the RUS won’t run until you copy the dll to the machine running the RUS. In this case I didn’t actually have access to the machine so I had to get a remote administrator to email me the file. Hassle.

  2. Nino Bilic says:

    Well,

    The trick here really is – how do you tell the RUS that it is not supposed to ever apply a specific policy?

    In current design, you can not really tell a specific RUS that it should apply a specific policy but never some other policy… if the address generator is installed on the 5.5 server, it will be added to 5.5 Site Addressing – and therefore will replicate into a policy on the Exchange 200x side. Seeing that the RUS can not know that it will not have to (at some point) apply a specific policy that is available, that is why there are problems if all address generators are not available. Hope that helps understand it?

  3. Stu Fox says:

    Fair enough, but why have it fail totally? Why not apply addresses it can do something about and log an event saying "I couldn’t apply this address type because…". That would certainly make it more useful and at least allow it to continue working to a certain extent.

  4. Bill Long says:

    Having the RUS continue and just not stamp the addresses corresponding to the missing DLL is an idea that’s been brought up, but since the problem is well documented, easy to identify, and easy to fix (unless you can’t get access to the machine with the DLL on it), it’s hard to justify making a hotfix to change the behavior, since changing the RUS to work this way runs the risk of destabilizing working code. They’re keeping this in mind for the future, though.

  5. Raveendran Chinnasamy says:

    We always have problems with RUS. Often resolved with just reboot of DC/Exchange servers( native /Singel forest/single domain /20000 user objects ). We contacted PSS and they are not able to help with event ids 8011.To resolve the issue nowadays we are just stick with reboot.

    Raveendran@gmail.com

  6. Bill Long says:

    If a reboot is fixing the problem, the most likely scenario is that an intermittent network problem is causing the RUS to hang in an LDAP query. You can determine if this is the case by closely watching the logging and looking for the 8011 and the corresponding 8012’s (see Part 2, which was posted today).

    The way I typically troubleshoot this is I start a netmon capture with the filter set to Exchange <-> DC, and then cycle the System Attendant. Then I just let the netmon run. Periodically, we’ll create a test user and see if he gets stamped. As soon as we notice the RUS is no longer stamping, we stop the netmon capture. Then, I use the app log to see when it last queried against the domain (looking for the 8011 and 8012s), and zero in on that time in the netmon. In the netmon you can follow the TCP conversation and see why it hung.

    In one recent case I worked on, this was due to the switch that the DC and the Exchange server were plugged into. We would see a bizarre series of frames where Exchange was acknowledging a particular packet and the DC kept retransmitting an older packet, causing the TCP conversation to fall apart. We changed the switch from autodetect to 100 full duplex, our TCP problems were gone, and the RUS was happy.

    It should not be necessary to reboot regularly to keep your RUS working. I’m not familiar with your particular case, but it sounds like your issue can certainly be resolved. I would encourage you to reopen the case with PSS and ask for further analysis of the problem. Point them to this blog or tell ’em to shoot me an email. :-)

  7. Anonymous says:

    &amp;nbsp;

    Recipient Update Service en Exchange 2000 y Exchange 2003

    Parte II – Seguimiento de problemas…

Comments are closed.

Skip to main content