Debugging slow response times in SQL Server 2008


Introduction


The other day I came across a nice article written by Mario Broodbakker called “SQL Server Wait Events: Taking the Guesswork out of Performance Profiling” (see references) on how to troubleshoot slow response times in SQL Server 2000 and 2005 using the YAPP Method. According to the author, one of the biggest challenges to apply this method in SQL Server is that fact that the wait statistics are collected and exposed at the server level (server wide):


… “This is simply not enough, unless you have the luxury of testing your application and SQL statements on an isolated server” …” But, of course, it’s often hard or impossible to isolate and replicate production problems on an isolated test server.” …


I decided to write this because in SQL Server 2008 the wait statistics are available at virtually any level, and (at the time of writing) I couldn’t find any articles on the new sqlos.wait_info and sqlos.wait_info_external extended events.


YAPP Method


The YAPP Method: “Yet Another Performance Profiling Method” is based on a paper written by Anjo Kolk (see references). The basic idea of the method can be expressed with the following formula:


(1)   Response time = service time + wait time


This method for performance troubleshooting has been around for some time and it is a fairly standard practice when debugging slow response times.


You can get the response time and the service time of a query by looking at sys.dm_exec_sessions, columns total_elapsed_time and cpu_time. Alternatively, you can use SQL trace or the more lightweight extended events in SQL Server 2008. Knowing response time and service time and applying formula (1), gives you wait time. So, we know how much we waited. However, in order to get the individual wait types we need the new extended events.


Per session wait statistics in SQL Server 2008


In SQL Server 2008 you can get wait statistics for a given session or statement by leveraging the new extended events. Optionally, you can limit the collection to a small set of wait types or collect only waits exceeding certain duration. If your event session is written carefully, you can use this technique on production systems with very little or no performance impact. In fact, SQL Server 2008 ships with a built-in event session, which is collecting wait statistics, and is running by default out of the box.


Here is how to create an event session to collect wait types for session 51. The events will go to the specified trace file:


create event session session_waits on server


add event sqlos.wait_info


(WHERE sqlserver.session_id=51 and duration>0)


, add event sqlos.wait_info_external


(WHERE sqlserver.session_id=51 and duration>0)


add target package0.asynchronous_file_target


      (SET filename=N’c:\wait_stats.xel’, metadatafile=N’c:\wait_stats.xem’);


 



Here is how to start the event session, run your workload, and then stop the event session:


alter event session session_waits on server state = start;


exec sp_workload;


alter event session session_waits on server state = stop;


 


Here is how to read the trace file:


select * from from sys.fn_xe_file_target_read_file


      (‘c:\wait_stats*.xel’, ‘c:\wait_stats*.xem’, null, null)


 


The result is a list of wait events, representing individual waits, having wait type and duration column. In order to make better sense out of this we need to group the events by wait type and calculate the total duration per wait type. The following code can do this:


create view dbo.read_xe_file as


select object_name as event, CONVERT(xml, event_data) as data


from sys.fn_xe_file_target_read_file


(‘c:\wait_stats*.xel’, ‘c:\wait_stats*.xem’, null, null)


go


 


create view dbo.xe_file_table as


select


      event


      , data.value(‘(/event/data/text)[1]’,‘nvarchar(50)’) as ‘wait_type’


      , data.value(‘(/event/data/value)[3]’,‘int’) as ‘duration’


      , data.value(‘(/event/data/value)[6]’,‘int’) as ‘signal_duration’


from dbo.read_xe_file


go



 


select


      wait_type


      , sum(duration) as ‘total_duration’


      , sum(signal_duration) as ‘total_signal_duration’


from dbo.xe_file_table


group by wait_type


order by sum(duration) desc


go


 


Example


Hopefully this example will demonstrate the power of sqlos.wait_info and sqlos.wait_info_external events. I did some measurements by executing the following workload from two connections:


create proc #sp_workload


      as


begin


      SET nocount ON


      CREATE TABLE #t1000k (c2 char(1020));


      BEGIN TRANSACTION


      DECLARE @rows INT = 50000;


      DECLARE @row INT = 0;


      DECLARE @count INT = 0;


 


      WHILE @row < @rows


      BEGIN


            INSERT INTO #t1000k (c2) VALUES (REPLICATE(‘a’,1000));


 


            SELECT @row = @row + 1


 


            IF @count > 10


            BEGIN


                  COMMIT


                  BEGIN TRANSACTION


                  SELECT @count=0


            END


           


            SELECT @count=@count+1


      END


     


      COMMIT


 


END


go


While running the workload I had IOMeter (see references) so that I can add some disk pressure.


Results for session 51: Apparently, most of the time was spent waiting for space in the log buffer to store log records – wait type LOGBUFFER. For complete list of all wait types with their descriptions search for sys.dm_os_wait_stats in MSDN or Books Online (see references).








































































Response time Service time Wait time
32619 1828 30791
wait_type total_duration total_signal_duration
LOGBUFFER 9662 66
PREEMPTIVE_OS_FILEOPS 6442 276
PREEMPTIVE_OS_FLUSHFILEBUFFERS 3833 150
PREEMPTIVE_OS_WRITEFILEGATHER 3015 66
SOS_SCHEDULER_YIELD 2481 2480
WRITE_COMPLETION 1971 0
IO_COMPLETION 1183 0
PAGEIOLATCH_UP 551 0
WRITELOG 417 0
PAGELATCH_UP 30 27
PAGELATCH_SH 10 10
PREEMPTIVE_XE_CALLBACKEXECUTE 4 200


 


Results for session 52: The result for this session is very similar. We can tell that most of the time was spent waiting. After accumulating the durations from all waits we get 30726, which is pretty close to the total wait time calculated using sys.dm_exec_sessions (31484).
















































Response time Service time Wait time
33890 2406 31484
wait_type total_duration total_signal_duration
LATCH_EX 16555 32
LOGBUFFER 10796 28
SOS_SCHEDULER_YIELD 2739 2729
WRITELOG 605 0
PAGELATCH_UP 26 25
PAGELATCH_SH 5 5


 


For the sake of completeness I also collected server-wide wait statistics, which are also available in previous version of SQL Server. The server-wide statistics clearly shows that we have IO bottleneck however you don’t know exactly which session or query was affected and by how much:


 






































































































wait_type wait_time_ms signal_wait_time_ms
LOGBUFFER 21666 127
WRITELOG 17118 4
LATCH_EX 16561 34
PAGEIOLATCH_UP 7891 0
SOS_SCHEDULER_YIELD 7058 7038
PREEMPTIVE_OS_FILEOPS 6443 0
WRITE_COMPLETION 5896 4
SLEEP_BPOOL_FLUSH 3916 278
PREEMPTIVE_OS_FLUSHFILEBUFFERS 3833 0
PREEMPTIVE_OS_WRITEFILEGATHER 3016 0
IO_COMPLETION 1680 0
PREEMPTIVE_OS_REPORTEVENT 141 0
PAGELATCH_UP 139 128
PREEMPTIVE_OS_PIPEOPS 128 0
PREEMPTIVE_XE_TARGETINIT 85 0
XE_BUFFERMGR_ALLPROCESSED_EVENT 70 0
PAGELATCH_SH 23 23
PREEMPTIVE_OS_QUERYREGISTRY 16 0
PREEMPTIVE_XE_CALLBACKEXECUTE 8 0
PAGELATCH_EX 7 7
PREEMPTIVE_XE_SESSIONCOMMIT 5 0
LATCH_SH 1 0
PREEMPTIVE_OS_GETDISKFREESPACE 1 0


 


Complete test code


Here is the complete source code of the test:


use tempdb


go


 



— Enable xp_cmdshell to delete trace files



exec sp_configure ‘show advanced options’, 1


reconfigure


exec sp_configure ‘xp_cmdshell’, 1


reconfigure


go


 


BEGIN TRY


      drop proc #sp_workload


END TRY BEGIN CATCH END CATCH


go


 


create proc #sp_workload


      as


begin


      SET nocount ON


      CREATE TABLE #t1000k (c2 char(1020));


      BEGIN TRANSACTION


      DECLARE @rows INT = 50000;


      DECLARE @row INT = 0;


      DECLARE @count INT = 0;


 


      WHILE @row < @rows


      BEGIN


            INSERT INTO #t1000k (c2) VALUES (REPLICATE(‘a’,1000));


 


            SELECT @row = @row + 1


 


            IF @count > 10


            BEGIN


                  COMMIT


                  BEGIN TRANSACTION


                  SELECT @count=0


            END


           


            SELECT @count=@count+1


      END


     


      COMMIT


 


END


go


 


BEGIN TRY


      drop event session session_waits on server


END TRY BEGIN CATCH END CATCH


go


 


exec xp_cmdshell ‘del c:\wait_stats*.xel’, no_output;


exec xp_cmdshell ‘del c:\wait_stats*.xem’, no_output;


go


 


declare @cmd nvarchar(max);


set @cmd = N’create event session session_waits on server ‘ +


      N’add event sqlos.wait_info (WHERE sqlserver.session_id=’ + convert(nvarchar(10), @@spid) + N’ and duration>0) ‘ +


      N’, add event sqlos.wait_info_external (WHERE sqlserver.session_id=’ + convert(nvarchar(10), @@spid) + N’ and duration>0) ‘ +


      N’add target package0.asynchronous_file_target (SET filename=N”c:\wait_stats.xel”, metadatafile=N”c:\wait_stats.xem”)’;


exec(@cmd);


go


 


BEGIN TRY


      drop table #snapshot


END TRY BEGIN CATCH END CATCH


go


 


select total_elapsed_time, cpu_time into #snapshot from sys.dm_exec_sessions where session_id=@@SPID;


go


 


alter event session session_waits on server state = start;


exec #sp_workload;


alter event session session_waits on server state = stop;


go


 


select


      (es.total_elapsed_time ss.total_elapsed_time) as ‘Response time’,


      (es.cpu_time ss.cpu_time) as ‘Service time’,


      ((es.total_elapsed_time ss.total_elapsed_time) (es.cpu_time ss.cpu_time)) as ‘Wait time’


      from sys.dm_exec_sessions es, #snapshot ss where session_id=@@SPID;


go


 


drop event session session_waits on server


go


 


BEGIN TRY


      drop view dbo.read_xe_file


END TRY BEGIN CATCH END CATCH


go


create view dbo.read_xe_file as


select object_name as event, CONVERT(xml, event_data) as data


from sys.fn_xe_file_target_read_file(‘c:\wait_stats*.xel’, ‘c:\wait_stats*.xem’, null, null)


go


 


BEGIN TRY


      drop view dbo.xe_file_table


END TRY BEGIN CATCH END CATCH


go


create view dbo.xe_file_table as


select


      event


      , data.value(‘(/event/data/text)[1]’,‘nvarchar(50)’) as ‘wait_type’


      , data.value(‘(/event/data/value)[3]’,‘int’) as ‘duration’


      , data.value(‘(/event/data/value)[6]’,‘int’) as ‘signal_duration’


from dbo.read_xe_file


go


 


select


      wait_type


      , sum(duration) as ‘total_duration’


      , sum(signal_duration) as ‘total_signal_duration’


from dbo.xe_file_table


group by wait_type


order by sum(duration) desc


go


 



— CLEANUP



exec xp_cmdshell ‘del c:\wait_stats*.xel’, no_output;


exec xp_cmdshell ‘del c:\wait_stats*.xem’, no_output;


drop view dbo.xe_file_table;


drop view dbo.read_xe_file;


drop proc #sp_workload;


 


References


Sys.dm_os_wait_stats


The following MSDN article shows all wait types with their description: http://msdn.microsoft.com/en-us/library/ms179984.aspx


IOMeter


IOMeter is an IO stress and measurement tool developed by Intel. It is now an Open Source tool, and freely downloadable from:


http://www.iometer.org


SQL Server Wait Events: Taking the Guesswork out of Performance Profiling


A very nice article by Mario Broodbakker on how to troubleshoot slow response times in SQL Server 2000 and SQL Server 2005:


http://www.simple-talk.com/sql/performance/sql-server-wait-events-taking-the-guesswork-out-of-performance-profiling/


Troubleshooting Performance Problems in SQL Server 2005


Here also has a very good presentation on the subject and describes a lot of SQL Server 2005 wait events here:


http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx


Comments (7)

  1. Anonymous says:

    Check out this cool article from the SQLOS team on the technet blogs. Read it here .

  2. Anonymous says:

    Pingback from Indications From Obscenely High Operator Costs In Query Plan? | Click & Find Answer !

  3. Anonymous says:

    Pingback from Indications From Obscenely High Operator Costs In Query Plan? | Click & Find Answer !

  4. Anonymous says:

    Pingback from Indications From Obscenely High Operator Costs In Query Plan? | Click & Find Answer !

  5. Anonymous says:

    Pingback from Indications From Obscenely High Operator Costs In Query Plan? | Click & Find Answer !

  6. Anonymous says:

    Pingback from What Is The Best Practice For Storing Image Varbinary(max) Data In A Database Table? | Click & Find Answer !