I struck a problem at a custom and the impact, while it seemed minor on the surface, was actually a big deal for their migration project. In fact, the large team they had assembled to migrate users from one forest to a new forest had stopped while this issue was investigated.
It relates to SID History and the way Windows queries for and caches Name-to-SID and SID-to-Name lookups from AD. This cache was causing SharePoint to think that a user who wanted to logon was actually a user from the wrong domain, and would create that person a new identity for that person within SharePoint for them.
The scenario is actually very close to this one:
But the workaround that we found would resolve the problem while they were migrating was pretty cool, so I thought I’d save it for all eternity here as a blog.
It boils down to this:
The LsaCache stores the previously looked-up domain user names and their SIDs. By asking a DC which has users that have both the new SID and the migrated SID on them at the same time, the DC always links the migrated SID to the new user name, not the old user name. If we can artificially fill the LsaCache with mappings for OLD USERNAME = OLD SID in our servers, then we can act as though no resources have migrated yet.
Here’s the scenario where users were migrated with SID History from child1.domainA.com to domainB.com
- CHILD1\bob logs onto a workstation in CHILD1 and opens the SPS site in DOMAINB (intranet.domainB.com)
- SPS asks IIS, which asks Windows for a local DC to resolve a remote SID: S-1-5-21-[SID_for_CHILD1]-1010
- The local DC finds the SID assigned to the migrated user in the global catalog
- The local DC returns the account name of the migrated user, DOMAIN2\bob
- The SPS server adds the result to its LsaCache as a mapping for this SID to the DOMAIN2 account
So we can see from the picture above that the LsaCache (the table in the bottom right of the drawing) has a mapping for NEW USERNAME = OLD SID but we want OLD USERNAME = OLD SID
So, let’s warm up the LsaCache so it looks the way we’d like it to:
- SPS constantly runs a script to query for the name CHILD1\bob
- The local DC queries its Global Catalog and does NOT have a record for this username
- The local DC must do its own LSA query to a DC in the domain CHILD1 for this name
- The remote DC in CHILD1 finds the user and replies with the SID: S-1-5-21-[SID_for_CHILD1]-1010
- The CHILD1 DC returns this to the DOMAINB DC (the DOMAINB DC caches this result in its own LsaCache)
- The local DC returns this result to the SPS server
- The SPS server adds this entry to its LsaCache
Ah ha! Now our cache looks the way we’d like it, where OLD USERNAME = OLD SID. This way when a query for OLD SID is made, the result from cache will return OLD USERNAME.
- CHILD1\bob logs onto a workstation in CHILD1 and opens the SPS site in DOMIANB (intranet.domainB.com)
- SPS does NOT ask the local DC for the remote SID, it uses its LsaCache
- The LsaCache on SPS replies back with the username which relates to the SID: S-1-5-21-[SID_for_CHILD1]-1010 is CHILD1\bob
The important step here is the red X where there IS NO STEP. What I mean is that the SharePoint server never talked to the DC to get the OLD SID lookup to return a result, meaning that we relied totally on the warmed up cache on the SPS alone.
This relies on the LsaCache on the SPS server ALWAYS having the entry for the SID from the CHILD1 domain matching the CHILD1 username, and never matching the DOMAINB username. The only way to ensure this is:
- Constantly query from the SPS server for the name CHILD1\username for every user in DOMAINB which has been migrated from CHILD1 and has its SIDHistory migrated with it. Use a tool which invokes LookupAccountName() to locate the SID for the username: CHILD1\username. LookupAccountName is explained here: http://msdn.microsoft.com/en-us/library/aa379159(v=vs.85). I had access to a private tool which would do these queries for us. I suspect that PsGetSid from Sysinternals would be able to help out here too, but we never tried it.
- The LsaCache on SPS must be large enough to sure that the entries which are queried are never overwritten by entries from DOMAINB. Set the reg value HKLM\System\CurrentControlSet\Contol\Lsa\LsaLookupCacheMaxSize = (DWORD) = 0x2000 (8192 decimal). If this value does not exist the system uses a default cache size of 128 entries, which is overwritten too quickly on the busy SPS servers. 8192 entries on a pair of load balanced servers should be able to hold all SIDs for all users accessing the SPS site in the 2 forests (if your forest has more users, you’ll need to increase this.
- This is a workaround. The real fix is to have the users who are migrated from CHILD1.domain.com to domainB.com with SIDHistory should use their migrated accounts immediately. After the migration, their CHILD1 accounts should be disabled/deleted and SIDHistory should be removed from the DOMAINB accounts. This is an operationally very difficult action to do as it does not allow for an easy testing path or roll-back path.
To view the actions as they are performed by LSA Lookups, add these 2 DWORDs to the registry under HKLM\System\CurrentControlSet\Control\Lsa\:
- LspDbgTraceOptions = 0x1 (1 means “log to a file”, the file is C:\Windows\Debug\Lsp.log)
- LspDbgInfoLevel = 0x88888888 (all 8‘s in hex means “log as verbose as possible”)
These keys are explained here:
So, all in all a little complicated, but the workaround to increase the value for LsaLookupCacheMaxSize and constantly running a script on the SPS server to query for the SID for usernames in CHILD1 (with a filter to target only users which had been migrated to domainB) worked well for the customer.