This is part 2 of my lingering object blog series. The purpose of this blog is to help customers with Windows 2000 DCs make informed decisions on how to tackle this problem on a forest wide scope. For the sake of brevity, please review my first blog on this topic for the "Alphabet soup" "What are Lingering Objects" and "Do you have Lingering Objects in your Forest" questions.
REPADMIN /REMOVELINGERINGOBJECTS will not work in W2K environments
First, let’s explain why the repadmin /removelingeringobjects will not work if source or target of that operation is running W2K Server. The /rlo call is leveraging server side code that actually performs the comparison and cleanup work. This code was added to W2K3 Server and does not exist on W2K Server. So, any W2K3 DCs in your forest? The strategy in my other blog can be leveraged for those systems. For W2K systems, one or more of the below strategies must be added to the overall plan of attack.
What about lingering in the WR partition?
How do we handle getting consistency in the WR replica set (WR domain partition and configuration partition)? Recall in my previous blog that REPADMIN can get W2K3 WR DCs consistent with other W2K3 WR DCs for a given NC, but does not address lingering objects in the WR as compared to the RO. Well, with W2K DCs, none of the below methods address lingering objects in the WR NC when compared to other DCs hosting that NC as WR or when compared to DCs hosting that NC as RO…except building a new forest and the ldifde/replfix method.
Note: lingering in the WR when compared to other WR DCs for an NC is uncommon and is rare when compared to DCs hosting a RO of the NC.
Options to clean a forest when W2K DCs exist
Guaranteed success as a new forest built using W2K3 servers is set to strict replication consistency right out the gate.
TSL is 180 days which makes the forest more tolerant of replication outages that result in lingering objects.
Impractical and expensive
UnGC and GC
Can potentially clean all lingering objects from the RO environment if done methodically and systematically.
Risk of sourcing from a *dirty* partition containing lingering objects is high without a carefully thought out plan of attack.
Potentially huge network utilization hit depending on connectivity during exercise.
No GC available in site (assuming single GC site) during process.
Time consuming due to the NC tear down behavior on W2K DCs. Can be mitigated.
Does not address configuration or NDNCs.
This approach can take one of 2 forms and has a basic assumption that the writable replica set for each domain NC is consistent. This assumption is dangerous as it is certainly possible (al-be-it more rare) for lingering objects to exist in the WR partition when compared to other WR DCs for the same partition. Let's go with the dangerous assumption for the moment. The 2 approaches for this strategy are;
1. I'm cringing as I type this...unGC all GCs in the forest such that there are no GCs left (all lingering objects in the RO environment are destroyed). Then systematically promote new GCs. Yes, the cure may be more painful than the disease with this approach. I mainly wanted to present it here to be thorough and don't realistically think any organization would choose this approach.
2. Systematically and methodically unGC a few GCs at a time. The actual strategy used will differ based on individual IT org needs. The following is an example of a systematic and methodical approach that minimizes risk to operations and risk of sourcing in lingering objects onto the newly promoted GCs.
a.) Create logical AD maintenance site as a temporary site for use during the cleanup process. Create and configure site link connectivity to representative hub site.
b.) Add a representative DC from each domain in the forest. Allow automatic connection objects to be created or manually create them from another site.
c.) This site should have the inter site KCC disabled to remove the risk of the GC promotion creating connections from other GCs in the forest.
d.) Move a few GCs into the maintenance site (be sure to consider the authentication and LDAP needs of the site the GC just left during the maintenance window).
d.) The moved GCs should be isolated so they are not being hit by LDAP consumers over 3268. Prevent the registration of generic siteless SRV records for the duration of the process.
e.) unGC the boxes. REPADMIN /OPTIONS <GC-FQDN> -IS_GC. Either wait for the process to complete evidenced by DS event ID 1660 for each partition or speed up the tear down process.
f.) re-GC the boxes. REPADMIN /OPTIONS <GC-FQDN> +IS_GC. This will cause it to build inbound connections from the DCs in the maintenance site therefore sourcing its data from writable DCs only for each domain NC in the forest.
g.) Move the GCs back to their production sites.
h.) repeat d-g for all GCs in the forest.
i.) retire the maintenance site.
This isolation strategy is important, because without it, the promotion process can build connections from RO source partners which may themselves have lingering objects. The key tenants to keep in mind when planning a systematic and methodical cleanup are:
· Maintain business continuity for functions and applications that depend on GC lookups.
· Strict control of which systems GC promotion sources NC data from.
There are certainly other ways to go about strict control besides moving servers into dedicated maintenance sites and IT orgs may elect to leverage a different strategy or a combination of strategies to meet there needs.
Rehost all RO partitions on all GCs
Can be systematically performed to spread the bandwidth consumption out over time.
Only sources from WR NC so no risk of sourcing from *dirty* GCs.
Can target and clean one NC at a time
Can clean application partitions.
Port 3268 remains responsive which can produce irregular and unexpected query and authentication results during the rehost operation. This can be mitigated by putting these systems into maintenance mode (logically isolate them from production through the use of maintenance AD site, control SRV record and <1C> record registration)
Does not address configuration NC.
More labor intensive than unGC reGC.
This approach also has a basic assumption that the writable replica set for each domain NC and application NC is consistent. A dangerous assumption. This approach must be systematically and methodically planned and carried out to ensure business continuity during the exercise and strict control of which systems are used for sourcing NC data from. The following is an example of a systematic and methodical approach that minimizes risk to operations and risk of sourcing in lingering objects onto the newly promoted GCs.
a.) Prevent the GC from being used by consumers of GC services. There are many strategies here like moving GC to maintenance site, preventing the GC from registering GC specific SRV records, and have these records removed from DNS.
b.) Clean the GC by re-hosting all RO partitions and application NCs.
Example GC with 3 RO NCs (A,B,C) and 2 app NCs (D,E).
REPADMIN /REHOST <GCQDN> <LDAPDN of NC A> <good source DC writable for NC A>
REPADMIN /REHOST <GCFQDN> <LDAPDN of NC B> <good source DC writable for NC B>
REPADMIN /REHOST <GCFQDN> <LDAPDN of NC C> <good source DC writable for NC C>
REPADMIN /REHOST <GCFQDN> <LDAPDN of NC D> <good source DC writable for NC D> /APPLICATION
REPADMIN /REHOST <GCFQDN> <LDAPDN of NC E> <good source DC writable for NC E> /APPLICATION
c.) return rehosted GC to production.
d.) repeat a-c for all GCs in the forest.
Ldifde dumps, replfix.exe compares, and ldifde imports that call the removelingeringobject operational attribute to selectively clean all lingering objects found.
I am purposely leaving out the gory details of a systematic and thorough approach in this blog since working with MS support is required for this method. Strategically, it will be similar to the repadmin /rlo strategy in my first blog.
Targeted cleanup of lingering objects only where they exist.
Reports on lingering objects in writable.
Bandwidth consumption to copy LDIFDE dumps across the network less than other options (but still can be significant).
No GC downtime
Labor intensive large number of LDIFDE dumps and comparisons of every partition from every DC using the same strategy outlined in my first blog.
Extensive batch processing creation needed to automate the processes as much as possible.
Not really scalable as the volume of data to manage quickly becomes unwieldy as the forest size to clean increases.
Must work with MS support
So what if you have a mix of W2K3 and W2K?
Keep the following things in mind as you review a plan of attack in a mixed environment.
· Consider the business continuity risk cost of the existence of lingering objects while W2K DCs exist in the forest.
§ Did you answer yes to the "Do you have Lingering Objects in your Forest"? In my first blog on this topic.
§ Have you ever experienced any of the common symptoms associated with lingering objects?
§ How soon will all W2K DCs be retired and does it make more sense to postpone a forest wide cleanup until all DCs are running W2K3?
· Use strategies that minimize business impact.
§ Use repadmin /removelingeringobj for all W2K3 DC/GCs deployed.
§ Leverage Microsoft PSS support to assist with the planning and execution.
§ Review the pros and cons above to isolate which method makes the most sense. Perhaps more than one method makes sense.
· Use strategies that minimize cost.
§ Hopefully you have gathered that a full scale forest wide lingering object cleanup exercise is no trivial matter.
§ The more complex the plan of attack, the longer and more costly it will be to execute on.
· A phased approach has risks of the just cleaned GC to be re-contaminated by lingering object animation occurring in the environment
§ This can be tackled by monitoring just cleaned systems for 1388 events in the DSevent log. If 1388s are logged after the box is cleaned and before the forest is completely cleaned then a second pass against these boxes are in order.
§ This can be avoided by setting each box to Strict Replication Consistency as soon as it is cleaned. This must be thought over carefully because of the OS quarantine behavior of halting replication of the partition if an inbound replication request for a lingering object is discovered.
These postings are provided "AS IS" with no warranties, and confers no rights. The content of this site are personal opinions and do not represent the Microsoft corporation view in anyway. In addition, thoughts and opinions often change. Because a weblog is intended to provide a semi-permanent point-in-time snapshot, you should not consider out of date posts to reflect current thoughts and opinions.