Managing RID Pool Depletion

Hiya folks, Ned here again. When interviewing a potential support engineer at Microsoft, we usually start with a softball question like “what are the five FSMO roles?” Everyone nails that. Then we ask what each role does. Their face scrunches a bit and they get less assured. “The RID Master… hands out RIDs.” Ok, what are RIDs? “Ehh…”

That’s trouble, and not just for the interview. Poor understanding of the RID Master prevents you from adding new users, computers, and groups, which can disrupt your business. Uncontrolled RID creation forces you to abandon your domain, which will cost you serious money.

Today, I discuss how to protect your company from uncontrolled RID pool depletion and keep your domain bustling for decades to come.

Background

Relative Identifiers (RID) are the incremental portion of a domain Security Identifier (SID). For instance:

S-1-5-21-1004336348-1177238915-682003330-2100

==>

S-1-5-Domain Identifier-Relative Identifier

A SID represents a unique trustee, also known as a "security principal" – typically users, groups, and computers – that Windows uses for access control. Without a matching SID in an access control list, you cannot access a resource or prove your identity. It’s the lynchpin.

Every domain has a RID Master: a domain controller that hands each DC a pool of 500 RIDs at a time. A domain contains a single RID pool which generates roughly one billion SIDs (because of a 30-bit length, it’s 230 or 1,073,741,823 RIDs). Once issued, RIDs are never reused. You can’t reclaim RIDs after you delete security principals either, as that would lead to unintended access to resources that contained previously issued SIDs.

Anytime you create a writable DC, it gets 500 new RIDs from the RID Master. Meaning, if you promote 10 domain controllers, you’ve issued 5000 new RIDs. If 8 of those DCs are demoted, then promoted back up, you have now issued 9000 RIDs. If you restore a system state backup onto one of those DCs, you’ve issued 9500 RIDs. The balance of any existing RIDs issued to a DC is never saved – once issued they’re gone forever, even if they aren’t used to create any users. A DC requests more RIDs when it gets low, not just when it is out, so when it grabs another 500 that becomes part of its "standby" pool. When the current pool is empty, the DC switches to the standby pool. Repeat until doomsday.

Adding more trustees means issuing more blocks of RIDs. When you’ve issued the one billion RIDs, that’s it – your domain cannot create users, groups, computers, or trusts. Your RID Master logs event 16644The maximum domain account identifier value has been reached. ” Time for a support case.

You’re now saying something like, “One billion RIDs? Pffft. I only have a thousand users and we only add fifty a year. My domain is safe.” Maybe. Consider all the normal ways you “issue” RIDs:

  • Creating users, computers, and groups (both Security and email Distribution) as part of normal business operations.
  • The promotion of new DCs.
  • DCs gracefully demoted costs the remaining RID pool.
  • System state restore on a DC invalidates the local RID pool.
  • Active Directory domains upgraded from NT 4.0 inherit all the RIDs from that old environment.
  • Seizing the RID Master FSMO role to another server

Now study the abnormal ways RIDs are wasted:

  • Provisioning systems or admin scripts that accidentally bulk create users, groups, and computers.
  • Attempting to create enabled users that do not meet password requirements
  • DCs turned off longer than tombstone lifetime.
  • DC metadata cleaned.
  • Forest recovery.
  • The InvalidateRidPool operation.
  • Increasing the RID Block Size registry value.

The normal operations are out of your control and unlikely to cause problems even in the biggest environments. For example, even though Microsoft’s Redmond AD dates to 1999 and holds the vast majority of our resources, it has only consumed ~8 million RIDs - that's 0.7%. In contrast, some of the abnormal operations can lead to squandered RIDs or even deplete the pool altogether, forcing you to migrate to a new domain or recover your forest. We’ll talk more about them later; regardless of how you are using RIDs, the key to avoiding a problem is observation.

Monitoring

You now have a new job, IT professional: monitoring your RID usage and ensuring it stays within expected patterns. KB305475 describes the attributes for both the RID Master and the individual DCs. I recommend giving it a read, as the data storage requires conversion for human consumption.

Monitoring the RID Master in each domain is adequate and we offer a simple command-line tool I’ve discussed beforeDCDIAG.EXE. Part of Windows Server 2008+ or a free download for 2003, it has a simple test that shows the translated number of allocated RIDs called rIDAvailablePool:

Dcdiag.exe /test:ridmanager /v

For example, my RID Master has issued 3100 RIDs to my DCs and itself:

clip_image001 image

If you just want the good bit, perhaps for batching:

Dcdiag.exe /TEST:RidManager /v | find /i "Available RID Pool for the Domain"

For PowerShell, here is a slightly modified version of Brad Rutkowski's original sample function. It converts the high and low parts of riDAvailablePool into readable values:

function Get-RIDsRemaining

{

param ($domainDN)

$de = [ADSI]"LDAP://CN=RID Manager$,CN=System,$domainDN"

$return = new-object system.DirectoryServices.DirectorySearcher($de)

$property= ($return.FindOne()).properties.ridavailablepool

[int32]$totalSIDS = $($property) / ([math]::Pow(2,32))

[int64]$temp64val = $totalSIDS * ([math]::Pow(2,32))

[int32]$currentRIDPoolCount = $($property) - $temp64val

$ridsremaining = $totalSIDS - $currentRIDPoolCount

Write-Host "RIDs issued: $currentRIDPoolCount"

Write-Host "RIDs remaining: $ridsremaining"

}

image

Another sample, if you want to use the Active Directory PowerShell module and target the RID Master directly:

function Get-RIDsremainingAdPsh

{

param ($domainDN)

$property = get-adobject "cn=rid manager$,cn=system,$domainDN" -property ridavailablepool -server ((Get-ADDomain $domaindn).RidMaster)

$rid = $property.ridavailablepool

[int32]$totalSIDS = $($rid) / ([math]::Pow(2,32))

[int64]$temp64val = $totalSIDS * ([math]::Pow(2,32))

[int32]$currentRIDPoolCount = $($rid) - $temp64val

$ridsremaining = $totalSIDS - $currentRIDPoolCount

Write-Host "RIDs issued: $currentRIDPoolCount"

Write-Host "RIDs remaining: $ridsremaining"

}

image

Turn one of those PowerShell samples into a script that runs as a scheduled task that updates a log every morning and alerts you to review it. You can also use LDP.EXE to convert the RID pool values manually every day, if you are an insane person.

You should also consider monitoring the RID Block Size, as any increase exhausts your global RID pool faster. Object Access Auditing can help here. There are legitimate reasons to increase this value on certain DCs. For example, if you are the US Marine Corps and your DCs are in a warzone where they may not be able to talk to the RID Master for weeks. Be smart about picking values - you are unlikely to need five million RIDs before talking to the master again; when the DC comes home, lower the value back to default.

The critical review points are:

  1. You don’t see an unexpected rise in RID issuance.
  2. You aren’t close to running out of RIDs.

Let’s explore what might be consuming RIDs unexpectedly.

Diagnosis

If you see a large increase in RID allocation, the first step is finding what was created and when. As always, my examples are PowerShell. You can find plenty of others using VBS, free tools, and whatnot on the Internet.

You need to return all users, computers, and groups in the domain – even if deleted. You need the SAM account name, creation date, SID, and USN of each trustee. There are going to be a lot of these, so filter the returned properties to save time and export to a CSV file for sorting and filtering in Excel. Here’s a sample (it’s one wrapped line):

Get-ADObject -Filter 'objectclass -eq "user" -or objectclass -eq "computer" -or objectclass -eq "group"' -properties objectclass,samaccountname,whencreated,objectsid,uSNCreated -includeDeletedObjects | select-object objectclass,samaccountname,whencreated,objectsid,uSNCreated | Export-CSV riduse.csv -NoTypeInformation -Encoding UTF8

Here I ran the command, then opened in Excel and sorted by newest to oldest:

image
Errrp, looks like another episode of “scripts gone wild”…

Now it’s noodle time:

  • Does the user count match actual + previous user counts (or at least in the ballpark)?
  • Are there sudden, massive blocks of object creation?
  • Is someone creating and deleting objects constantly – or was it just once and you need to examine your audit logs to see who isn’t admitting it?
  • Has your user provisioning system gone berserk (or run by someone who needs… coaching)?
  • Have you changed your password policy and are now trying to create enabled users that do not meet password requirements (this uses up a RID during each failed creation attempt).
  • Do you use a VDI system that constantly creates and deletes computer accounts when provisioning virtual machines - we’ve seen those too: in one case, a third party solution was burning 4 million computer RIDs a month.

If the RID allocations are growing massively, but you don’t see a subsequent increase in new trustees, it’s likely someone increased RID Block Size inappropriately. Perhaps they set hexadecimal rather than decimal values – instead of the intended 15,000 RIDs per allocation, for example, you’d end up with 86,016!

It may also be useful to know where the updates are coming from. Examine each DC’s RidAllocationPool for increases to see if something is running on - or pointed at – a specific domain controller.

Recovery

You know there’s a problem. The next step is to stop things getting worse (as you have no way to undo the damage without recovering the entire forest).

If you identified the cause of the RID exhaustion, stop it immediately; your domain’s health is more important. If that system continues in high enough volume, it’s going to force you to abandon your domain.

If you can’t find the cause and you are anywhere near the end of your one billion RIDs, get a system state backup on the RID Master immediately. Then transfer the RID Master role to a non-essential DC that you shut down to prevent further issuance. The allocated RID pools on your DCs will run out, but that stops further damage. This gives you breathing space to find the bad guy. The downside is that legitimate trustee creation stops also. If you don’t already have a Forest Recovery process in place, you had better get one going . If you cannot figure out what's happening, open a support case with us immediately.

No matter what, you cannot let the RID pool run out. If you see:

  • SAM Event 16644
  • riDAvailablePool is “4611686015206162431
  • DCDIAG “Available RID Pool for the Domain is 1073741823 of 1073741823

... it is too late. Like having a smoke detector that only goes off when the house has burned down. Now you cannot create a trust for a migration to another domain . If you reach that stage, open a support case with us immediately. This is one of those “your job depends on it” issues, so don’t try to be a lone gunfighter.

Many thanks to Arren “cowboy killer” Connor for his tireless efforts and excellent internal docs around this scenario.

Finally, a tip: know all the FSMO roles before you interview with us. If you really want to impress, know that the PDC Emulator does more than just “emulate a PDC”. Oy vey.

 

UPDATE 11/14/2011:

Our seeds to improve the RID Master have begun growing and here's the first ripe fruit - https://support.microsoft.com/kb/2618669

 

 

Until next time.

Ned “you can’t get RID of me that easily” Pyle