SQL Server Performance Analysis Part IV


PART IV: “How To” get down to whatever is bringing SQL to its knees…



At this point, one should know already whether or not they have memory pressure / cpu bottleneck / disk issues (previous blog posts). With that information, we now filter down and look for the cause.

What is truly causing SQL Server’s performance to degrade:



SQL Blocker script:


http://support.microsoft.com/kb/298475


http://support.microsoft.com/kb/271509/


Suport WebCast: Microsoft SQL Server: Rapid Blocker Script Analysis


http://support.microsoft.com/kb/324885


This is a great script that provides us with a lot more information than we give it credit for.


Analyzing this script can be tedious – depending on it size – so we recommend using SQL Nexus (explanation of this tool on another blog post)


However should you analyze it using a editor, well then this is what we can get from it:


1. Blocking Information


If SQL Server is suffering from blocking – this will be the data that will help you identify what spids / what commands are involved. Please note that profiler should be used in conjunction with the blocker script in order to get the complete statement / command being executed.


The blocker script is takes a snapshot of sysprocesses every (default) 5 seconds and identifies blocking the following manner:


SPIDs at the head of blocking chains


spid  


——


     85


     92


    175


One can then look at see what the following spids where executing and why was this causing blocking. The two most obvious (and common) reasons for this is


1. Un-optimized queries – one can then look into profiler trace (performance events) or even execution plans of the commands


2. Spids “waiting for command” – spids that are waiting for user input – this could be from a commit to rollback to any other command. This spid is basically sitting on locks and waiting for a command to either let go or continue. The open_transactions will equal to 1 in this case.


2. Execution Information:


This script also gives us information in regards to CPU, duration, reads, writes, and other data relevant to the execution of the statement and work that spid has been doing (should he be executing a batch of statements). These counters are also a good way to see if a spid (not involved in locks but possible head blocker) is still doing any work – if the counters are still increasing then the spid is still alive.


Other useful information is the hostname (pinpoint source), dbid (where is all the action happening) and loginname.


NOTE:



DBID is quite handy as you many have a situation in which you don’t have blocking but your SQL’s performance is slow. Through this script you can see what activity is going on your server and on which database. If you see lots of spids on tempdb, you may have a tempdb contention. This is one example of how useful the information given by this script is.


http://support.microsoft.com/kb/328551 –> section MORE INFORMATION


“From the sysprocesses system table output, the waitresource may show up as “2:1:1”  explain dbid, fileid , page num (PFS Page) or “2:1:3″ (SGAM Page). Depending on the degree of contention, this may also lead to SQL Server appearing unresponsive for short periods.”


Note:



Waitresource gives information in regards to the object involved:


example:


waitresource=KEY: 9:72057594065256448 (ef008dce9cbe)


waitresource=KEY: 6:72057594057457664
dbid=6 objectname=AdventureWorks.dbo.T2
associatedObjectId=72057594057457664
keylock hobt_id=72057594057457664


From the Hobt_id we get all information:


SELECT o.name, i.name
FROM sys.partitions p
JOIN sys.objects o ON p.object_id = o.object_id
JOIN sys.indexes i ON p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE p.hobt_id = 72057594065256448


example 2:


Resource      Format      Example
Table      DatabaseID:ObjectID      TAB: 5:261575970
In this case, database ID 5 is the pubs sample database and object ID 261575970 is the titles table.
Page      DatabaseID:FileID:PageID      PAG: 5:1:104
In this case, database ID 5 is pubs, file ID 1 is the primary data file, and page 104 is a page belonging to the titles table.
Key      DatabaseID:ObjectID:IndexID (Hash value for index key)      KEY: 5:261575970:1 (5d0164fb1eac)
In this case, database ID 5 is pubs, object ID 261575970 is the titles table, index ID 1 is the clustered index, and the hash value indicates the index key value for the particular row.


http://msdn.microsoft.com/en-us/library/ms178104.aspx


http://support.microsoft.com/kb/224453


http://sqldev.net/articles/WaitTypes.htm


This leads to the third most interesting aspect of this script result:


3. Waittypes | WaitResource | Waittime


The information provided here can tell us quite a lot of what is going on.






Waittypes:


The value in the waittype field indicates the type of resource that the connection is waiting on.







WaitResources:


The waitresource column provides more detailed information about the specific resource that the SPID is waiting on.







Waittime:


The waittime column provides the number of milliseconds that the SPID has been waiting with the current waittype.


SQL Server 2000 Wait Types


http://www.sqldev.net/misc/waittypes.htm


Description of the waittype and lastwaittype columns in the master.dbo.sysprocesses table in SQL Server 2000 and SQL Server 2005


http://support.microsoft.com/kb/822101


Best way to explain how useful this information is, is by providing an example(s):


1. Tempdb Contention

































Spid


Status


Blocked


Open_Transactions


WaitResource


Waittype


142


Suspended


152


2


2:1:6508


0x0034                   


144


Suspended


152


2


2:1:6508


0x0034


152


Runnable

 

2


2:1:6508


0x0000


Most of the SPIDS are blocked with waitresource = ‘2:1:X’ (example) and waittype as “PAGELATCH_XX”.


This seems to contribute to majority of the blocking. The wait resource the ‘2:1:X’ indicates blocking on SGAM page in the tempdb database.


2. Running out of Threads


SYSPROC FIRST PASS



























Spid


Ecid


Waittype


17


0


0x0046


52


0


0x0046


90


0


0x020B


165


0


0x0046


190


0


0x0208





























227


24


0x0208


227


25


0x0046


227


26


0x0208


232


55


0x0046


232


56


0x0046


235


0


0x0800

























244


40


0x0208


247


0


0x0046


248


0


0x0046


268


0


0x0046


270


0


0x0208

















270


22


0x0208


270


23


0x0046


270


24


0x0208


UMSTHREAD 0x46 This waittype indicates that a batch has been received from a client application but that there are no worker threads that are available to service the request. If you consistently see 0x0046 waittypes for multiple SPIDs, there is a significant bottleneck elsewhere in the system that is using all the available worker threads. Note that the waittime column is always 0 for the UMSTHREAD waittype, and the lastwaittype column may erroneously show the name of a different waittype instead of UMSTHREAD.”


Note:



Execution context identifier (ECID). The execution context ID of a given thread associated with a specific SPID.


ECID = {0,1,2,3, …n}, where 0 always represents the main or parent thread, and {1,2,3, …n} represent the subthreads.


In this case, we would need to analyze why we are running out of threads – or better yet – why aren’t the other spids letting go of theirs.


3. CXPacket



























































Spid


Kpid


Waittype


Waittime


lastwaittype 


<SPID>


4224


0x00BA


546


CXPACKET 


<SPID>


148


0x00BA


546


CXPACKET 


<SPID>


5764


0x00BA


1234


CXPACKET 


<SPID>


6008


0x00BA


718


CXPACKET 


<SPID>


3164


0x00BA


5093


CXPACKET 


<SPID>


268


0x00BA


5125


CXPACKET 


<SPID>


1760


0x00BA


5125


CXPACKET 


<SPID>


5704


0x00BA


5078


CXPACKET 


Note: Many times the CXPacket spid will be in the “suspended” state.


A cxpacket wait basically means that part of the job is done but is waiting on another part of the job to finish.  Queries that are involved in these wait types have a cost threshold longer than the (default) 5 second hence they need to be optimized to run quicker (indexes / tuning) and maxdop needs to configured correctly (see previous posts). 


Please note that this script should not be used solemnly if you suffer from blocking issues. This script provides us with a look at what is going on with SQL for a duration of time. It tells us who (spid / hostname/ login name / application name), what (commands / reads / writes / duration / cpu), where (dbid), and when (start time / end time) is happening in SQL.


Next Blog Post: Part V – PerfStats ….