Troubleshooting ConfigMgr 2012: Using Extended Events to Find Blocks in SQL Server

~ Scott Rachui | Support Escalation Engineer

ToolsRecently I worked with a customer who was using System Center 2012 Configuration Manager (ConfigMgr 2012) and having a significant problem with queries that were running slowly and having a noticeable impact the performance of his SQL Server.  While working the issue, a Microsoft SQL Server support engineer was engaged to help identify what was causing the impact. During the course of that case I learned a lot about SQL Server locks and blocks so I thought I would take a minute and share some of that knowledge in case you find yourself in a similar situation.

When we first engaged the SQL engineer, his first action was to check for “locks and blocks” to see if this might be the source of the customer’s problem.  Because I was not entirely familiar with what this meant at the time, I did some research into the topic to be prepared when I might need this information in the future. One of the resources I came across, which inspired me to write the article below with an emphasis on Extended Events, is Don Jones’ excellent book Learn SQL Server Administration in a Month of Lunches.  There are an awful lot of technical books on the market but I’ve only encountered a few that are easy to understand and also quite enjoyable.  For those of you who have read any technical documentation in your career, you know what a rare gift it is to be able to teach technical topics in a way that’s understandable and interesting at the same time.  At the outset, I wanted to mention Don’s book and recommend it highly to anyone who has even a passing interest in working with Microsoft SQL Server technologies.  In my opinion, you won’t find a better place to start.

Overview

As a ConfigMgr admin there are times when it may be necessary to determine why Configuration Manager is responding more slowly than normal.  One possibility when conducting this analysis is a slowdown on SQL Server itself, and in this article I’ll talk briefly about one reason SQL might become non-responsive and how to identify what’s going on.

SQL Server is designed so that multiple processes can read data from a database table simultaneously in most cases, but when it comes to updating those same tables with new data (e.g. inserting new information, deleting rows, etc.), SQL needs to take precautions so that only one process at a time can do this.  For example, if I have a database table called Employees, it is acceptable for multiple applications to read that table simultaneously to find out what someone’s address or phone number might be.  But if one of those applications needs to change the phone number of a person in that table, it’s important that no other applications are trying to change that same data at the same time.  To ensure this exclusive access when writing changes to the table (or any other portion of the database), SQL Server “locks” the portion of the database being changed until that change has been completed.

Typically, changes written to the database occur so fast that it’s not noticeable by users, however sometimes if a SQL statement needs to access a large number of tables or is highly complex, there may be a delay for other processes trying to access those same tables.  For example, a poorly written query that requires exclusive access to a number of tables in a database can noticeably bog down a SQL server and prevent other applications from completing their tasks in a timely manner.  Given the degree to which ConfigMgr depends on SQL, it is important for the ConfigMgr admin to have at least a basic understanding of locks and how to identify them.

Transactions Defined

TechNet defines a transaction as “a sequence of operations performed as a single logical unit of work”.  When a process accesses a SQL database, it does so in logical units of work, and this often involves interacting with multiple tables. For example, updating the status of a package delivered through ConfigMgr would need to potentially update multiple tables all within the same transaction.  Depending on the complexity of a transaction, different portions of the SQL database, up to and including the entire database, may be involved.

One of the reasons that things happen in the context of a transaction is consistency of the database.  If a process is updating multiple tables with new information but that update fails, it’s important to keep the update from being placed in some of the tables while it fails in others.  For this reason, the transaction will ensure that the data is either updated in all of the tables or in none of them.  In other words, if one of the update operations fails, all of them fail.  This is known as an explicit transaction.

The important thing to understand about transactions is that these represent single units of work.  Depending on the extent and complexity of the transaction, the portion of the database impacted by the transaction may be quite large and may encompass the entire database in some circumstances.  To ensure no other changes are being made to the portion of the database the process is updating, SQL allows processes to lock that portion of the database.  This grants the process exclusive access to make its changes.  Depending on the type of lock, that may mean that other processes can’t even read data from the “locked” tables until the transaction is finished.

Understanding SQL Locks and Deadlocks

There are different types of locks used by applications when modifying data in a SQL database.  These lock types describe the varying levels of isolation used during database modifications, which determines how exclusive is the access to the selected portion of the database. These lock types are enumerated in a good article found here.  Briefly, the types of locks include:

    • Shared – These locks do not change data.  When a shared lock exists on a resource, no other transactions can modify this data.  Shared locks allow for multiple transactions access to the resource.  SELECT statements use shared locks.
    • Update – This type of lock is used to prevent deadlock conditions (explained below).  If a transaction needs to update a resource, it requires exclusive access to that resource to ensure no competing process is making changes at the same time.  Update locks are used to prevent problems when two resources are attempting to make updates to the same resource
    • Exclusive – This type of lock prevents access to a resource by other processes.  An exclusive lock is used when a process is updating information in a SQL database, and during that update it prevents other processes from making changes to the locked portion of the database.  Depending on what the transaction is doing, this may be a row in a table, an entire table, or even an entire database

Deadlocks

Deadlocks are a special type of lock, and they can occur when two processes are attempting to make updates to resources that the other has locked.  For example, if Process1 has locked ResourceA and needs to make changes to ResourceB, it will attempt to lock the portion of the SQL database contained in ResourceB.  But what if Process2 has already locked ResourceB and is waiting to update ResourceA at the same time?  In this case, Process1 won’t release ResourceA until it completes updating ResourceB, and Process2 won’t release ResourceB until it finishes updating ResourceA.  This is known as a deadlock, and the use of Update locks are an attempt to work around this problem.

Identifying Locks

When dealing with SQL performance issues, taking time to ensure there are no locks preventing other processes from completing is a good idea.  While it’s possible to get some basic information on locks by reviewing Perfmon (specifically the SQL Server: Locks object), for more robust data we want to use Extended Events

Extended Events

Extended Events is the logical heir to SQL Profiler which we’re told will eventually be deprecated (no time soon, but better to get a head start!).  Part of the reason for this is the heavy overhead that SQL Profiler brings, making it useful only for real time troubleshooting.  By contrast, Extended Events has a lower overhead and can be run persistently, generating event data that can be written to memory or even reports for later viewing.

To start with, Extended Events comes with a couple pre-configured sessions.  In order to see these, open SQL Server Management Studio and navigate to the Management\Extended Events folder and expand Sessions.  When you do this, you should see the AlwaysOn_health and system_health sessions as shown in the screenshot below:

clip_image002

The sub-containers under each of these sessions is explained below:

    • Event_File – When you see event_file this indicates the session is configured to send its results to a file that can be reviewed later as needed.  The default location for these files (though you can specify your own location for sessions you create) on my lab computer is C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log.
    • Ring_Buffer – This option holds events in memory on the SQL Server itself.  It is intended to maintain a recent history of events that can be accessed from memory rather than opening and reviewing an event file.

To view events stored in a ring bugger, click on the appropriate entry under the desired session (in my case package0.ring_buffer).  This will open a result window as shown In part) below:

clip_image003

To look at the actual data resident in memory for this session, click the link and a new query result window will open up showing in XML format.  Below is an example of data collected by the system_health session in my lab:

clip_image004

AlwaysOn_health and system_health sessions are useful, but since we’re interested in finding out about the behavior of locks on our SQL Server, let’s turn our attention to what Extended Events offers in that area.

Investigating Locks with Extended Events

Because there is no default session for SQL locks, we need to create a new one before we can gather data.  Right-clicking on the Sessions container reveals two options for us: We can either choose to create a new session manually or we can invoke the New Session Wizard.  While either option is possible, we’ll choose the New Session Wizard.

In the first screen of the wizard we can see that we need to select a name for the session.  I have chosen to call this SQL Locks but the name is ultimately up to you.  I would just encourage you to use a name that’s easy to identify later on, especially if you do a lot of work with Extended Events and end up with many different sessions.  You can also choose whether to start this particular session when the server starts up.  While this might be useful for some sessions like system_health, it probably isn’t needed for the present session so we’ll leave it unchecked.

clip_image005

In the second window of the wizard, we can choose whether to use a template or not.  As you can see, the default template listed is ‘Count Query Locks’.  For our purposes since we want to have full control over which events we’ll add, we will choose ‘Do not use a template’ and select ‘Next’.

clip_image006

In the ‘Select Events to Capture’ screen, we initially see a list of every event that is part of Extended Events.  For our present purposes we care about events related to SQL locks, so we’ll type the word ‘locks’ into the Event library window.  This will filter the events to return only those related to SQL locks as shown below:

clip_image007

By clicking on each specific event, a description of the purpose of the event will appear at the bottom of the screen so you can be certain you are selecting the specific events you care about.  There are a number of interesting events related to locks that we might choose from.  A sampling of these are described below:

    • Lock_acquired – This occurs when a lock has been acquired against a particular object (or objects) in the database.
    • Lock_cancel – This occurs when a request for a lock is canceled.
    • lock_escalation – This occurs when a lock is escalated (for example, from locking a single row in a table to locking the entire table).
    • lock_released – This occurs when a lock is released by a process.

Depending on what you’re doing, different events may provide the data you need.  In our case, we care about which processes might be blocking others, and thus slowing SQL down, so for this purpose we want to look at the blocked_process_report.

We add this by selecting the event and moving it to the Selected events window as shown below:

clip_image008

Note that when adding an event, certain event properties known as fields are included automatically as shown below.  Other fields can be added on the following page to supplement this list.

clip_image009

Once we have selected our events, choose Next.  At this point we can select the additional fields of the event that we want to capture.  As can be seen below, there is a very broad list to choose from:

clip_image010

Depending on the specific information of interest, it may not be necessary to add any additional fields.  The additional fields that might make sense are database_name and sql_text, which I’ve selected above.  Once any additional fields have been chosen, select ‘Next’.

On the next screen is the opportunity to filter events, if needed, based on clauses.  This limits the extent of the data and can help to screen out some of the background noise that might otherwise result.  For the purposes of this example, no filters will be added.

clip_image011

On the next screen is the opportunity to specify where the data collected by the session will be stored, as shown by the screen below:

clip_image012

The options available are:

    • Save data to a file for later analysis – Using this option allows you to set the file location for where the events will be stored (the file will be stored in a .xel format).  You can specify both the maximum size of a single .xel file as well as how many of these files will be created, so a max size of 1 MB x 10 files = 10 MB of space must be available.
    • Work with only the most recent data – This option stores events in memory on the SQL server.  The amount of space allocated to store these events by selecting the maximum buffer memory size.  It’s also possible to specify the maximum number of events to store. As the screenshot says, 0 means unlimited.

If both options are selected, events will be retained in memory (ring_buffer) and will also be written to a .xel file (event_file).  Once these options have been configured, select Next.  At this point, the wizard will complete and you are given the option of starting the session if it was not configured to start automatically.

You can choose to watch live events by right-clicking the session and choosing ‘Watch Live Data’ as shown below.

clip_image013

Once there is some information to review, you can also navigate to the location of the .xel file(s) and double-click them to review what has been saved to file.

clip_image014

Testing the Extended Events Session

Creating the session to show SQL locks is one thing, but how do you know if it’s working as designed?  In a production environment with a lot going on this may not be difficult, however in a test lab (and you should always test things in a lab before deploying them to production), there might not be enough traffic to generate the necessary data to confirm you’ve set things up correctly.

Thankfully, it’s easy to set up a lock condition so that the Extended Events session can be tested.  To do this, I used the following procedure in my lab:

1. Configure the desired threshold for reporting on blocked processes.

2. I opened two separate sessions of SQL Server Mgmt. Studio on two separate lab machines, both pointing to my test database.

3. On one machine, I configured a SQL statement that would take an extended period of time and would lock the table with which it was interacting.

4. On the second machine, I configured a simple query against that same table.

5. I ran the first SQL statement from the first machine which locked the table.  While the first statement was still running, I then ran the simple query on my second machine

To illustrate this a bit better, here are the steps I used in a bit more detail.  These steps assume the Extended Events session to detect locks has been configured and is running.

Configuring the Blocked Processes Threshold

To test the Extended Events session in a lab, it’s usually necessary to configure a very brief threshold before a process is considered to be blocked.  This is known as the ‘blocked process threshold’ and it’s configured by modifying the sp_configure stored procedure.

In my lab, I set a 1 second timeout as follows:

 
sp_configure 'show advanced options', 1 ;
GO
RECONFIGURE ;
GO
sp_configure 'blocked process threshold', 1 ;
GO
RECONFIGURE ;
GO

This script can be found on the MSDN website here.

Machine 1: Establishing the Lock

On Machine 1, I opened SQL Server Mgmt. Studio, pointed to the BigCompany database (which I created for this test) and selected ‘New Query’.  To create the locked condition, I created the following SQL script in the query window:

 
DECLARE @num int
SET @num = 1
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
    While @num < 200000 BEGIN;
        Update  Employees set LastName = 'User' where FirstName = 'Joe';
        SET @num = @num + 1;
    End;
    GO
ROLLBACK TRANSACTION;

To run this, I had previously created a test database called BigCompany and populated it with a table called Employees.  The script above does the following:

    • Declares a variable called @num and configures it to accept integer values.
    • Sets the initial value of @num to 1.
    • Establishes an isolation level of ‘serializable’ which prevents other queries from touching the Employees table while this script ran.
    • Created a While/End loop to update a value with the same information, essentially replacing it with a value that’s already present so the data in the table is never modified, 200,000 times.  This causes the script to run long enough for me to run the query from Machine 2 and see that the table is locked.
    • Incremented the value of @num by one each time through the While/End loop. Otherwise I’d end up in an infinite loop.
    • Finally, I rolled back the transaction so no change is ultimately made to the table.

Machine 2: Querying the Employees table while it is locked

The second step is fairly simple.  I set up SQL Server Management Studio on Machine 2, opened a New Query and ensured I was pointing to the BigCompany database.  While the SQL script above was running, I ran the following query from Machine 2:

 
Select * from Employees

The result was to see this query delayed while the script above ran, illustrating that the Employees table was, indeed, locked.  Once the script was finished, the query completed successfully.

When the query finishes, navigate to the location where the report is stored (in my lab, this was C:\Extended Events – SQL) and look at the .xel file that was created.  By double-clicking it you will get a view similar to the following:

clip_image015

To review details about the report, select one of the entries and double-click the ‘blocked_process’ value.  This results in an XML report being displayed as follows:

clip_image016

By reviewing this report, you can see very quickly which is the blocked and which is the blocking process.  First, here is the blocked process:

clip_image017

This is the query we ran from Machine 2, as we’d expect.  Further down the same report we also see which process is doing the blocking:

clip_image018

This validates that what we have configured in Extended Events does return the data we want.  Obviously, when you deploy this in production you’ll want to set a different threshold for reporting on your blocked processes (1 second should really only be used in your test lab).  But when you deploy this, you should have a nice report that allows you to capture this important information.  And the best part is that you can run it automatically without the overhead required by SQL Profiler.

Summary

There are many ways to troubleshoot blocked processes in SQL.  This article has been written to show how this can be done using Extended Events.  We are told that Extended Events will be replacing SQL Profiler in the future, so it’s a good idea to become familiar with it.

Hopefully you won’t have to deal with blocked processes on a regular basis, but if you do this tool should give you a good way to identify what’s going on so the issue can be remediated.

Scott Rachui | Support Escalation Engineer | Microsoft GBS Management and Security Division

Get the latest System Center news on Facebook and Twitter:

clip_image001 clip_image002

System Center All Up: http://blogs.technet.com/b/systemcenter/

Configuration Manager Support Team blog: http://blogs.technet.com/configurationmgr/ 
Data Protection Manager Team blog: http://blogs.technet.com/dpm/ 
Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/ 
Operations Manager Team blog: http://blogs.technet.com/momteam/ 
Service Manager Team blog: http://blogs.technet.com/b/servicemanager 
Virtual Machine Manager Team blog: http://blogs.technet.com/scvmm

Microsoft Intune: http://blogs.technet.com/b/microsoftintune/
WSUS Support Team blog: http://blogs.technet.com/sus/
The RMS blog: http://blogs.technet.com/b/rms/
App-V Team blog: http://blogs.technet.com/appv/
MED-V Team blog: http://blogs.technet.com/medv/
Server App-V Team blog: http://blogs.technet.com/b/serverappv
The Surface Team blog: http://blogs.technet.com/b/surface/
The Application Proxy blog: http://blogs.technet.com/b/applicationproxyblog/

The Forefront Endpoint Protection blog : http://blogs.technet.com/b/clientsecurity/
The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
The Forefront TMG blog: http://blogs.technet.com/b/isablog/
The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/