IMPORTANT ANNOUNCEMENT FOR OUR READERS!
AskPFEPlat is in the process of a transformation to the new Core Infrastructure and Security TechCommunity, and will be moving by the end of March 2019 to our new home at https://aka.ms/CISTechComm (hosted at https://techcommunity.microsoft.com). Please bear with us while we are still under construction!
We will continue bringing you the same great content, from the same great contributors, on our new platform. Until then, you can access our new content on either https://aka.ms/askpfeplat as you do today, or at our new site https://aka.ms/CISTechComm. Please feel free to update your bookmarks accordingly!
Why are we doing this? Simple really; we are looking to expand our team internally in order to provide you even more great content, as well as take on a more proactive role in the future with our readers (more to come on that later)! Since our team encompasses many more roles than Premier Field Engineers these days, we felt it was also time we reflected that initial expansion.
If you have never visited the TechCommunity site, it can be found at https://techcommunity.microsoft.com. On the TechCommunity site, you will find numerous technical communities across many topics, which include discussion areas, along with blog content.
NOTE: In addition to the AskPFEPlat-to-Core Infrastructure and Security transformation, Premier Field Engineers from all technology areas will be working together to expand the TechCommunity site even further, joining together in the technology agnostic Premier Field Engineering TechCommunity (along with Core Infrastructure and Security), which can be found at https://aka.ms/PFETechComm!
As always, thank you for continuing to read the Core Infrastructure and Security (AskPFEPlat) blog, and we look forward to providing you more great content well into the future!
Hey y’all, Mark back again. I’d like to say in my best TV show announcer voice, we have a real treat for you today. Have you ever wondered what clients were sending expensive or inefficient LDAP queries to your domain controllers? Are long running LDAP queries possibly leading to poor server application performance or even failures of these applications? What about which clients are sending an excessive amount of LDAP queries to domain controllers? Are these queries leading to high CPU utilization on your DCs? Are these queries even completing or are they timing out in some cases?
Have you suspected all of the above might be happening but had no easy way to identify such queries or the IP addresses sending them? Today with some help from PowerShell we will finally have that easy way you’ve been looking for.
First we need to ensure our DCs are capturing the enhanced 1644 event metadata. To enable this you need to do the following.
-Have a Server 2012 R2 DC or have KB 2800945 installed on Server 2012, Server 2008 R2 or Server 2008 domain controllers.
– Configure registry keys for Field Engineering to 5 (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Diagnostics\15 Field Engineering)
Note: Field Engineering diagnostic logging is not enabled by default and it should only be enabled when actively troubleshooting. Logging level 5 will cause numerous events other than the 1644 event to be captured in your directory services event log. You’ll want to turn this setting on when actively troubleshooting LDAP queries and then turn set the logging level back to 0 when you are done. NO reboot is required to turn this setting on or off so really you have no excuse.
Next, configure the values for the registry-based filters for expensive, inefficient and long running searches. If the following registry entries exist, change the values to the desired threshold in milliseconds. If the registry entries do not exist, create a new entry with that name, and then set its value to the desired threshold.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Expensive Search Results Threshold
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Inefficient Search Results Threshold
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\Search Time Threshold (msecs)
Our current thinking is to set the values for “expensive” and “inefficient” both to a value of 0 then start with a Search time Threshold setting of 100 Milliseconds (100 decimal / 64 hex).
After the registry values are set on the DCs you want to analyze, you should start seeing 1644 events logged in the Directory Services log. After you’ve collected enough data (say 30 minutes worth during peak hours, when queries are slow to execute or when the CPU is running hot), go ahead and export the Directory Services Log. Then you’ll want to grab the 1644 Reader PowerShell script from the TechNet scripting library and copy it to a PowerShell capable computer that also has MSFT Excel on it
Right click on the PS script in Explorer and choose “run with powershell”.
The script will allow you to input a path containing the .EVT / .evtx files or process all.evt[x] files in the same directory as the script by just hitting enter. You can also have add directory service event logs from more than one domain controller and the script will report on ldap query metrics that clients submitted across those DCs.
The Event1644-reader stores its initial output as a .csv file before generating the Excel XLS file. If you want to keep appending newer logs to this same report we have a time saving tip for you. Remove the old evtx files but keep the .csv file. The script will enumerate all evtx first, then any csv to import.
Open the XLS script created above
Click “enable content”
Now that we have our data I’m going to do a quick walk through of the data displayed in 8 tabs of the Microsoft Excel 2010 and later built by the Event 1644 reader PS script.
This is the data straight from the 1644 events log separated by column. The name of the DCs that serviced each LDAP query is captured in column A labeled “LDAP server” LDAP queries are captured in Column F labeled “Filter”. The data filter allows you to isolate specific queries like those from client X to DC Y issuing query Z.
Tab 2- TopIP_StartingNode
The TopIP Starting node tab shows you which directory partition is being queried the most based on Search Count and Average Search Time. The %GrandTotal shows you the overall percentage that each partition is being. In the example above, 99% of LDAP queries in this report targeted the configuration NC, making it a good 1stcandidate for investigation. Many cells in the XLS built by the 1644 reader PS feature a “click-through” capability where clicking on given cell displays the underlying “metadata” for the value being displayed. For example, clicking on the partition name in starting node grouping displays the top filters (queries) targeting that partition. Clicking through again displays the IP addresses that issued that query. . Clicking again shows the date and time those queries were issued, allowing you to answer the questions “What”, “who” and “when” clients generated a specific workload.
Also note in the above screenshot the LDAPServer Filter is (All). You can use this drop down to select a specific DC if more than one log was in the directory.
Focusing based on LDAP query count
Tab 3- TopIP
This shows you the IP addresses that generated the most queries in descending order. The %GrandTotal is the overall percentage total from this specific client. In the example above 10.92% of all queries came from client #1 (row 4). %Running Total is the percentage if you were adding up the clients. In the example above, the top 9 callers generated 61+% of all LDAP queries by volume.
You are also able to expand each client IP to see what query they were sending.
Tab 4- TopIP-Filters
This tab is really the inverse of tab 3. It is showing what query was sent the Search Count in descending volume order. The %GrandTotal and %RunningTotal work the same way as Tab 3.
We can also expand to see what IPs are making that exact query.
Focusing based on LDAP query time
Tab 5- TopTime-IP
Tabs 3 and 4 focused more on total number of queries. Tabs 5 and 6 focuses on actual time of these queries. Tab 5 shows lots of really interesting info. First it shows which IP is taking the most search time. But it also shows the average search time. We’ll come back to this point later. %GrandTotal and %RunningTotal work the same as tabs 3 and 4.
You are also able to expand to see what query that host is doing which will be helpful later on.
This is once again the inverse of the previous sheet. Instead of sorting by IP for total search time we are sorting by the actual query.
As before we can expand the query to see which client.
Buckets and Sandbox
The TimeRanks tab breaks the query volume into 50 MS time-based quartiles go give a picture of the overall query performance. For example 2,125 or 14.82% of the LDAP queries in this sample completed in 50-99 MS. If we move a few rows down to the 250-299 MS bucket we can see that 77.85% of all queries sent were completed in 299 MS or less since our % running total includes all the pervious buckets. While some queries executed 300 or more MS, given that there were only 3 of them, it may be more interesting to optimize queries in the shorter time quartiles.
Expanding a bucket will show you all queries that fell into it as well as the total count.
Tab 8 is just a sandbox tab where you are able to create your own pivot tables from the data. If you find an interesting data pivot let us know in the comments.
As they say in Spiderman, with great data comes great responsibility, probably. Now that you have this information the first question to ask yourself is, what am I going to do with all this? Let me run through a few things to help get you started.
What Queries Are Taking The Longest?
Your gut probably told you to look at total number of queries and we’ll get there but I want to focus on something else first on Tab 5.
Excessive query volume can be a problem. But if they are being serviced very quickly they might not be a problem at all. However by focusing on Tab 5 we are going to see which clients are causing the most search time. Work with these to figure out what application is making these long running LDAP calls, ProcMon and WPR are good for this. Don’t forget to check the AvgSearchTime as well. Here we can we see in row 11 that someone made 1 call and it took 44,336 MS to return since they basically just asked for everything. Clearly we could make some adjusts there in terms of efficacy. The next client in row 14 asked for a pretty complicated query and it took 30,061 MS to return. Again optimizations can hopefully be made.
Who is sending all the queries?
Tab 3 is a great place to start. Focus on those clients that have a high search count and a high average search time. Again you’ll need to trace this back on the client side to what application is making this call. ProcMon and WPR should get you started.
Are These Clients In The Right AD Site?
We’ve discussed previously about finding clients talking to DCs in the wrong AD site. If you see any clients that shouldn’t be talking with the DCs you have logs for you may need to confirm your subnets are defined properly in AD Sites and Services.
As you dig through this data and work with your developers or application owners you might come to find some of the slowness is due to the attributes being indexed. Thankfully we’ve already covered this topic in great length here. Remember while the index is building it may delay AD replication and to will also increase your database size.
Working With Developers?
This information is also really useful to your development teams. Take this as an opportunity to partner with them and attack these queries that can be improved with them. Things they may be able to do including client side caching of data, throttling the queries or even re-writing the queries to make them more efficient. They might not even realize what the query is doing or how slow it is taking since nobody has called to complain. Now is a good time to get ahead of the problem, work together and everyone wins.
Do I have the latest LDAP query optimizer installed?
Improvements are still being made in how domain controllers can further optimize LDAP queries. There are a few updates you may want to avoid, and some others you should download and test.
Windows Server 2012 R2 RTM introduced a new query optimizer and the most detailed LDAP logging capabilities.
KB 2862304 backported subsets of the LDAP query optimizer to Windows Server 2012, 2008 R2 and 2008 DCs. That code introduced a defect where queries that referenced undefined attributes where slower to execute. That performance problem is resolved for Windows Server 2012 R2 DCs by KB 3042816. An update for KB3042816 is still being worked on for 2012, 2008 R2 and 2008.
Hopefully this is enough to get you started and really start reducing excessive LDAP queries and optimizing performance. Want to again send a thank you to Arren and Ming Chen for his wonderful script.
Mark “fully LDAP optimized” Morowczynski