The Version Store Called, and They’re All Out of Buckets

Hello, Ryan Ries back at it again with another exciting installment of esoteric Active Directory and ESE database details!

I think we need to have another little chat about something called the version store.

The version store is an inherent mechanism of the Extensible Storage Engine and a commonly seen concept among databases in general. (ESE is sometimes referred to as Jet Blue. Sometimes old codenames are so catchy that they just won’t die.) Therefore, the following information should be relevant to any application or service that uses an ESE database (such as Exchange,) but today I’m specifically focused on its usage as it pertains to Active Directory.

The version store is one of those details that the majority of customers will never need to think about. The stock configuration of the version store for Active Directory will be sufficient to handle any situation encountered by 99% of AD administrators. But for that 1% out there with exceptionally large and/or busy Active Directory deployments, (or for those who make “interesting” administrative choices,) the monitoring and tuning of the version store can become a very important topic. And quite suddenly too, as replication throughout your environment grinds to a halt because of version store exhaustion and you scramble to figure out why.

The purpose of this blog post is to provide up-to-date (as of the year 2016) information and guidance on the version store, and to do it in a format that may be more palatable to many readers than sifting through reams of old MSDN and TechNet documentation that may or may not be accurate or up to date. I can also offer more practical examples than you would probably get from straight technical documentation. There has been quite an uptick lately in the number of cases we’re seeing here in Support that center around version store exhaustion. While the job security for us is nice, knowing this stuff ahead of time can save you from having to call us and spend lots of costly support hours.

Version Store: What is it?

As mentioned earlier, the version store is an integral part of the ESE database engine. It’s an area of temporary storage in memory that holds copies of objects that are in the process of being modified, for the sake of providing atomic transactions. This allows the database to roll back transactions in case it can’t commit them, and it allows other threads to read from a copy of the data while it’s in the process of being modified. All applications and services that utilize an ESE database use version store to some extent. The article “How the Data Store Works” describes it well:

“ESE provides transactional views of the database. The cost of providing these views is that any object that is modified in a transaction has to be temporarily copied so that two views of the object can be provided: one to the thread inside that transaction and one to threads in other transactions. This copy must remain as long as any two transactions in the process have different views of the object. The repository that holds these temporary copies is called the version store. Because the version store requires contiguous virtual address space, it has a size limit. If a transaction is open for a long time while changes are being made (either in that transaction or in others), eventually the version store can be exhausted. At this point, no further database updates are possible.”

When Active Directory was first introduced, it was deployed on machines with a single x86 processor with less than 4 GB of RAM supporting NTDS.DIT files that ranged between 2MB and a few hundred MB. Most of the documentation you’ll find on the internet regarding the version store still has its roots in that era and was written with the aforementioned hardware in mind. Today, things like hardware refreshes, OS version upgrades, cloud adoption and an improved understanding of AD architecture are driving massive consolidation in the number of forests, domains and domain controllers in them, DIT sizes are getting bigger… all while still relying on default configuration values from the Windows 2000 era.

The number-one killer of version store is long-running transactions. Transactions that tend to be long-running include, but are not limited to:

– Deleting a group with 100,000 members
– Deleting any object, not just a group, with 100,000 or more forward/back links to clean
– Modifying ACLs in Active Directory on a parent container that propagate down to many thousands of inheriting child objects
– Creating new database indices
– Having underpowered or overtaxed domain controllers, causing transactions to take longer in general
– Anything that requires boat-loads of database modification
– Large SDProp and garbage collection tasks
– Any combination thereof

I will show some examples of the errors that you would see in your event logs when you experience version store exhaustion in the next section.

Monitoring Version Store Usage

To monitor version store usage, leverage the Performance Monitor (perfmon) counter:

‘\\dc01\Database ==> Instances(lsass/NTDSA)\Version buckets allocated’

image
(Figure 1: The ‘Version buckets allocated’ perfmon counter.)

The version store divides the amount of memory that it has been given into “buckets,” or “pages.” Version store pages need not (and in AD, they do not) equal the size of database pages elsewhere in the database. We’ll get into the exact size of these buckets in a minute.

During typical operation, when the database is not busy, this counter will be low. It may even be zero if the database really just isn’t doing anything. But when you perform one of those actions that I mentioned above that qualify as “long-running transactions,” you will trigger a spike in the version store usage. Here is an example of me deleting a group that contains 200,000 members, on a DC running 2012 R2 with 1 64bit CPU:

image(Figure 2: Deleting a group containing 200k members on a 2012 R2 DC with 1 64bit CPU.)

The version store spikes to 5332 buckets allocated here, seconds after I deleted the group, but as long as the DC recovers and falls back down to nominal levels, you’ll be alright. If it stays high or even maxed out for extended periods of time, then no more database transactions for you. This includes no more replication. This is just an example using the common member/memberOf relationship, but any linked-value attribute relationship can cause this behavior. (I’ve talked a little about linked value attributes before here.) There are plenty of other types of objects that may invoke this same kind of behavior, such as deleting an RODC computer object, and then its msDs-RevealedUsers links must be processed, etc..

I’m not saying that deleting a group with fewer than 200K members couldn’t also trigger version store exhaustion if there are other transactions taking place on your domain controller simultaneously or other extenuating circumstances. I’ve seen transactions involving as few as 70K linked values cause major problems.

After you delete an object in AD, and the domain controller turns it into a tombstone, each domain controller has to process the linked-value attributes of that object to maintain the referential integrity of the database. It does this in “batches,” usually 1000 or 10,000 depending on Windows version and configuration. This was only very recently documented here. Since each “batch” of 1000 or 10,000 is considered a single transaction, a smaller batch size will tend to complete faster and thus require less version store usage. (But the overall job will take longer.)

An interesting curveball here is that having the AD Recycle Bin enabled will defer this action by an msDs-DeletedObjectLifetime number of days after an object is deleted, since that’s the appeal behind the AD Recycle Bin – it allows you to easily restore deleted objects with all their links intact. (More detail on the AD Recycle Bin here.)

When you run out of version storage, no other database transactions can be committed until the transaction or transactions that are causing the version store exhaustion are completed or rolled back. At this point, most people start rebooting their domain controllers, and this may or may not resolve the immediate issue for them depending on exactly what’s going on. Another thing that may alleviate this issue is offline defragmentation of the database. (Or reducing the links batch size, or increasing the version store size – more on that later.) Again, we’re usually looking at 100+ gigabyte DITs when we see this kind of issue, so we’re essentially talking about pushing the limits of AD. And we’re also talking about hours of downtime for a domain controller while we do that offline defrag and semantic database analysis.

Here, Active Directory is completely tapping out the version store. Notice the plateau once it has reached its max:

image(Figure 3: Version store being maxed out at 13078 buckets on a 2012 R2 DC with 1 64bit CPU.)

So it has maxed out at 13,078 buckets.

When you hit this wall, you will see events such as these in your event logs:

Log Name: Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 5/16/2016 5:54:52 PM
Event ID: 1519
Task Category: Internal Processing
Level: Error
Keywords: Classic
User: S-1-5-21-4276753195-2149800008-4148487879-500
Computer: DC01.contoso.com
Description:
Internal Error: Active Directory Domain Services could not perform an operation because the database has run out of version storage.

And also:

Log Name: Directory Service
Source: NTDS ISAM
Date: 5/16/2016 5:54:52 PM
Event ID: 623
Task Category: (14)
Level: Error
Keywords: Classic
User: N/A
Computer: DC01.contoso.com
Description:
NTDS (480) NTDSA: The version store for this instance (0) has reached its maximum size of 408Mb. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back.

The peculiar “408Mb” figure that comes along with that last event leads us into the next section…

How big is the Version Store by default?

The “How the Data Store Works” article that I linked to earlier says:

“The version store has a size limit that is the lesser of the following: one-fourth of total random access memory (RAM) or 100 MB. Because most domain controllers have more than 400 MB of RAM, the most common version store size is the maximum size of 100 MB.”

Incorrect.

And then you have other articles that have even gone to print, such as this one, that say:

“Typically, the version store is 25 percent of the physical RAM.”

Extremely incorrect.

What about my earlier question about the bucket size? Well if you consulted this KB article you would read:

The value for the setting is the number of 16KB memory chunks that will be reserved.”

Nope, that’s wrong.

Or if I go to the MSDN documentation for ESE:

“JET_paramMaxVerPages
This parameter reserves the requested number of version store pages for use by an instance.

Each version store page as configured by this parameter is 16KB in size.”

Not true.

The pages are not 16KB anymore on 64bit DCs. And the only time that the “100MB” figure was ever even close to accurate was when domain controllers were 32bit and had 1 CPU. But today, domain controllers are 64bit and have lots of CPUs. Both version store bucket size and number of version store buckets allocated by default both double based on whether your domain controller is 32bit or 64bit. And the figure also scales a little bit based on how many CPUs are in your domain controller.

So without further ado, here is how to calculate the actual number of buckets that Active Directory will allocate by default:

(2 * (3 * (15 + 4 + 4 * #CPUs)) + 6400) * PointerSize / 4

Pointer size is 4 if you’re using a 32bit processor, and 8 if you’re using a 64bit processor.

And secondly, version store pages are 16KB if you’re on a 32bit processor, and 32KB if you’re on a 64bit processor. So using a 64bit processor effectively quadruples the default size of your AD version store. To convert number of buckets allocated into bytes for a 32bit processor:

(((2 * (3 * (15 + 4 + 4 * 1)) + 6400) * 4 / 4) * 16KB) / 1MB

And for a 64bit processor:

(((2 * (3 * (15 + 4 + 4 * 1)) + 6400) * 8 / 4) * 32KB) / 1MB

So using the above formulae, the version store size for a single-core, 64bit DC would be ~408MB, which matches that event ID 623 we got from ESE earlier. It also conveniently matches 13078 * 32KB buckets, which is where we plateaued with our perfmon counter earlier.

If you had a 4-core, 64bit domain controller, the formula would come out to ~412MB, and you will see this line up with the event log event ID 623 on that machine. When a 4-core, Windows 2008 R2 domain controller with default configuration runs out of version store:

Log Name:      Directory Service
Source:        NTDS ISAM
Date:          5/15/2016 1:18:25 PM
Event ID:      623
Task Category: (14)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      dc02.fabrikam.com
Description:
NTDS (476) NTDSA: The version store for this instance (0) has reached its maximum size of 412Mb. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back.

The version store size for a single-core, 32bit DC is ~102MB. This must be where the original “100MB” adage came from. But as you can see now, that information is woefully outdated.

The 6400 number in the equation comes from the fact that 6400 is the absolute, hard-coded minimum number of version store pages/buckets that AD will give you. Turns out that’s about 100MB, if you assumed 16KB pages, or 200MB if you assume 32KB pages. The interesting side-effect from this is that the documented “EDB max ver pages (increment over the minimum)” registry entry, which is the supported way of increasing your version store size, doesn’t actually have any effect unless you set it to some value greater than 6400 decimal. If you set that registry key to something less than 6400, then it will just get overridden to 6400 when AD starts. But if you set that registry entry to, say, 9600 decimal, then your version store size calculation will be:

(((2 *(3 * (15 + 4 + 4 * 1)) + 9600) * 8 / 4) * 32KB) / 1MB = 608.6MB

For a 64bit, 1-core domain controller.

So let’s set those values on a DC, then run up the version store, and let’s get empirical up in here:

image(Figure 4: Version store exhaustion at 19478 buckets on a 2012 R2 DC with 1 64bit CPU.)

(19478 * 32KB) / 1MB = 608.7MB

And wouldn’t you know it, the event log now reads:

image(Figure 5: The event log from the previous version store exhaustion, showing the effect of setting the “EDB max ver pages (increment over the minimum)” registry value to 9600.)

Here’s a table that shows version store sizes based on the “EDB max ver pages (increment over the minimum)” value and common CPU counts:

Buckets

1 CPU

2 CPUs

4 CPUs

8 CPUs

16 CPUs

6400

(The default)

x64: 410 MB

x86: 103 MB

x64: 412 MB

x86: 103 MB

x64: 415 MB

x86: 104 MB

x64: 421 MB

x86: 105 MB

x64: 433 MB

x86: 108 MB

9600

x64: 608 MB

x86: 152 MB

x64: 610 MB

x86: 153 MB

x64: 613 MB

x86: 153 MB

x64: 619MB

x86: 155 MB

x64: 631 MB

x86: 158 MB

12800

x64: 808 MB

x86: 202 MB

x64: 810 MB

x86: 203 MB

x64: 813 MB

x86: 203 MB

x64: 819 MB

x86: 205 MB

x64: 831 MB

x86: 208 MB

16000

x64: 1008 MB

x86: 252 MB

x64: 1010 MB

x86: 253 MB

x64: 1013 MB

x86: 253 MB

x64: 1019 MB

x86: 255 MB

x64: 1031 MB

x86: 258 MB

19200

x64: 1208 MB

x86: 302 MB

x64: 1210 MB

x86: 303 MB

x64: 1213 MB

x86: 303 MB

x64: 1219 MB

x86: 305 MB

x64: 1231 MB

x86: 308 MB

Sorry for the slight rounding errors – I just didn’t want to deal with decimals. As you can see, the number of CPUs in your domain controller only has a slight effect on the version store size. The processor architecture, however, makes all the difference. Good thing absolutely no one uses x86 DCs anymore, right?

Now I want to add a final word of caution.

I want to make it clear that we recommend changing the “EDB max ver pages (increment over the minimum)” only when necessary; when the event ID 623s start appearing. (If it ain’t broke, don’t fix it.) I also want to reiterate the warnings that appear on the support KB, that you must not set this value arbitrarily high, you should increment this setting in small (50MB or 100MB increments,) and that if setting the value to 19200 buckets still does not resolve your issue, then you should contact Microsoft Support. If you are going to change this value, it is advisable to change it consistently across all domain controllers, but you must also carefully consider the processor architecture and available memory on each DC before you change this setting. The version store requires a contiguous allocation of memory – precious real-estate – and raising the value too high can prevent lsass from being able to perform other work. Once the problem has subsided, you should then return this setting back to its default value.

In my next post on this topic, I plan on going into more detail on how one might actually troubleshoot the issue and track down the reason behind why the version store exhaustion is happening.

Conclusions

There is a lot of old documentation out there that has misled many an AD administrator on this topic. It was essentially accurate at the time it was written, but AD has evolved since then. I hope that with this post I was able to shed more light on the topic than you probably ever thought was necessary. It’s an undeniable truth that more and more of our customers continue to push the limits of AD beyond that which was originally conceived. I also want to remind the reader that the majority of the information in this article is AD-specific. If you’re thinking about Exchange or Certificate Services or Windows Update or DFSR or anything else that uses an ESE database, then you need to go figure out your own application-specific details, because we don’t use the same page sizes or algorithms as those guys.

I hope this will be valuable to those who find themselves asking questions about the ESE version store in Active Directory.

With love,

Ryan “Buckets of Fun” Ries