SQL Server Performance Analysis Part IV

PART IV: "How To" get down to whatever is bringing SQL to its knees...

At this point, one should know already whether or not they have memory pressure / cpu bottleneck / disk issues (previous blog posts). With that information, we now filter down and look for the cause.

What is truly causing SQL Server's performance to degrade:

SQL Blocker script:

https://support.microsoft.com/kb/298475

https://support.microsoft.com/kb/271509/

Suport WebCast: Microsoft SQL Server: Rapid Blocker Script Analysis

https://support.microsoft.com/kb/324885

This is a great script that provides us with a lot more information than we give it credit for.

Analyzing this script can be tedious - depending on it size - so we recommend using SQL Nexus (explanation of this tool on another blog post)

However should you analyze it using a editor, well then this is what we can get from it:

1. Blocking Information

If SQL Server is suffering from blocking - this will be the data that will help you identify what spids / what commands are involved. Please note that profiler should be used in conjunction with the blocker script in order to get the complete statement / command being executed.

The blocker script is takes a snapshot of sysprocesses every (default) 5 seconds and identifies blocking the following manner:

SPIDs at the head of blocking chains

spid  

------

     85

     92

    175

One can then look at see what the following spids where executing and why was this causing blocking. The two most obvious (and common) reasons for this is

1. Un-optimized queries - one can then look into profiler trace (performance events) or even execution plans of the commands

2. Spids "waiting for command" - spids that are waiting for user input - this could be from a commit to rollback to any other command. This spid is basically sitting on locks and waiting for a command to either let go or continue. The open_transactions will equal to 1 in this case.

2. Execution Information:

This script also gives us information in regards to CPU, duration, reads, writes, and other data relevant to the execution of the statement and work that spid has been doing (should he be executing a batch of statements). These counters are also a good way to see if a spid (not involved in locks but possible head blocker) is still doing any work - if the counters are still increasing then the spid is still alive.

Other useful information is the hostname (pinpoint source), dbid (where is all the action happening) and loginname.

NOTE:

DBID is quite handy as you many have a situation in which you don't have blocking but your SQL's performance is slow. Through this script you can see what activity is going on your server and on which database. If you see lots of spids on tempdb, you may have a tempdb contention. This is one example of how useful the information given by this script is.

https://support.microsoft.com/kb/328551 --> section MORE INFORMATION

"From the sysprocesses system table output, the waitresource may show up as "2:1:1" explain dbid, fileid , page num (PFS Page) or "2:1:3" (SGAM Page). Depending on the degree of contention, this may also lead to SQL Server appearing unresponsive for short periods."

Note:

Waitresource gives information in regards to the object involved:

example:

waitresource=KEY: 9:72057594065256448 (ef008dce9cbe)

waitresource=KEY: 6:72057594057457664
dbid=6 objectname=AdventureWorks.dbo.T2
associatedObjectId=72057594057457664
keylock hobt_id=72057594057457664

From the Hobt_id we get all information:

SELECT o.name, i.name
FROM sys.partitions p
JOIN sys.objects o ON p.object_id = o.object_id
JOIN sys.indexes i ON p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE p.hobt_id = 72057594065256448

example 2:

Resource      Format      Example
Table      DatabaseID:ObjectID      TAB: 5:261575970
In this case, database ID 5 is the pubs sample database and object ID 261575970 is the titles table.
Page      DatabaseID:FileID:PageID      PAG: 5:1:104
In this case, database ID 5 is pubs, file ID 1 is the primary data file, and page 104 is a page belonging to the titles table.
Key      DatabaseID:ObjectID:IndexID (Hash value for index key)      KEY: 5:261575970:1 (5d0164fb1eac)
In this case, database ID 5 is pubs, object ID 261575970 is the titles table, index ID 1 is the clustered index, and the hash value indicates the index key value for the particular row.

https://msdn.microsoft.com/en-us/library/ms178104.aspx

https://support.microsoft.com/kb/224453

https://sqldev.net/articles/WaitTypes.htm

This leads to the third most interesting aspect of this script result:

3. Waittypes | WaitResource | Waittime

The information provided here can tell us quite a lot of what is going on.

Waittypes:

The value in the waittype field indicates the type of resource that the connection is waiting on.

WaitResources:

The waitresource column provides more detailed information about the specific resource that the SPID is waiting on.

Waittime:

The waittime column provides the number of milliseconds that the SPID has been waiting with the current waittype.

SQL Server 2000 Wait Types

https://www.sqldev.net/misc/waittypes.htm

Description of the waittype and lastwaittype columns in the master.dbo.sysprocesses table in SQL Server 2000 and SQL Server 2005

https://support.microsoft.com/kb/822101

Best way to explain how useful this information is, is by providing an example(s):

1. Tempdb Contention

Spid

Status

Blocked

Open_Transactions

WaitResource

Waittype

142

Suspended

152

2

2:1:6508

0x0034                   

144

Suspended

152

2

2:1:6508

0x0034

152

Runnable

 

2

2:1:6508

0x0000

Most of the SPIDS are blocked with waitresource = '2:1:X' (example) and waittype as "PAGELATCH_XX".

This seems to contribute to majority of the blocking. The wait resource the '2:1:X' indicates blocking on SGAM page in the tempdb database.

2. Running out of Threads

SYSPROC FIRST PASS

Spid

Ecid

Waittype

17

0

0x0046

52

0

0x0046

90

0

0x020B

165

0

0x0046

190

0

0x0208

...

227

24

0x0208

227

25

0x0046

227

26

0x0208

232

55

0x0046

232

56

0x0046

235

0

0x0800

...

244

40

0x0208

247

0

0x0046

248

0

0x0046

268

0

0x0046

270

0

0x0208

...

270

22

0x0208

270

23

0x0046

270

24

0x0208

UMSTHREAD 0x46 This waittype indicates that a batch has been received from a client application but that there are no worker threads that are available to service the request. If you consistently see 0x0046 waittypes for multiple SPIDs, there is a significant bottleneck elsewhere in the system that is using all the available worker threads. Note that the waittime column is always 0 for the UMSTHREAD waittype, and the lastwaittype column may erroneously show the name of a different waittype instead of UMSTHREAD."

Note:

Execution context identifier (ECID). The execution context ID of a given thread associated with a specific SPID.

ECID = {0,1,2,3, ...n}, where 0 always represents the main or parent thread, and {1,2,3, ...n} represent the subthreads.

In this case, we would need to analyze why we are running out of threads - or better yet - why aren't the other spids letting go of theirs.

3. CXPacket

Spid

Kpid

Waittype

Waittime

lastwaittype 

<SPID>

4224

0x00BA

546

CXPACKET 

<SPID>

148

0x00BA

546

CXPACKET 

<SPID>

5764

0x00BA

1234

CXPACKET 

<SPID>

6008

0x00BA

718

CXPACKET 

<SPID>

3164

0x00BA

5093

CXPACKET 

<SPID>

268

0x00BA

5125

CXPACKET 

<SPID>

1760

0x00BA

5125

CXPACKET 

<SPID>

5704

0x00BA

5078

CXPACKET 

Note: Many times the CXPacket spid will be in the "suspended" state.

A cxpacket wait basically means that part of the job is done but is waiting on another part of the job to finish.  Queries that are involved in these wait types have a cost threshold longer than the (default) 5 second hence they need to be optimized to run quicker (indexes / tuning) and maxdop needs to configured correctly (see previous posts). 

Please note that this script should not be used solemnly if you suffer from blocking issues. This script provides us with a look at what is going on with SQL for a duration of time. It tells us who (spid / hostname/ login name / application name), what (commands / reads / writes / duration / cpu), where (dbid), and when (start time / end time) is happening in SQL.

Next Blog Post: Part V – PerfStats ….