Deconstructing Extended Event XML

In a recent engagement, I had need of an Extended Event to gather range scans on my target instance. The overall premise was to determine which queries were driving the large number of range_scans being reported by a recent health check. The target instance was a SQL 2008 R2 SP3 build.

So… lets create an extended event, and include the scan_started event, and as an extra ACTION, the tsql_text and the tsql_stack. Why both ACTIONs? I discovered that on this particular instance the tsql_text ACTION wasn’t always populated, so I grabbed the tsql_stack as well too - just to be sure. The only issue here is that the tsql_stack is quite deeply buried within the resulting XML, so a bit of digging is needed to make it available – hence this BLOG post. Now... a word of caution about the tsql_stack - this is a pretty heavyweight ACTION to include within the XEvent, so use with caution :-)

Ok, so, here is the event I defined:-

[sql]
CREATE EVENT SESSION scan_started ON SERVER
ADD EVENT sqlserver.scan_started(
ACTION(sqlserver.tsql_stack, sqlserver.sql_text)
WHERE (database_id = 8)) -- Insert your database_id here.
ADD TARGET package0.ring_buffer
WITH (
MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=1 SECONDS,
MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=OFF)
GO
[/sql]

 

Then we start it:

[sql]
ALTER EVENT SESSION scan_started ON SERVER
STATE = START;
GO
[/sql]

 

Now, all I need is a workload to run which is generating range_scans and I should be able to see them…. And indeed I can. The resulting XML for my workload looks like this :-

 

image

 

Now, in order to determine which queries were most responsible for the range_scans I needed to group the resulting tsql statements by execution count. To achieve this, I used the following t-sql to break apart the XML:

[sql]
;WITH XEventCTE (timestamp, DatabaseName, ObjectName, index_id, hobt_id, sql_text,
level, handle, line, offset_start, offset_end)
AS
(
SELECT
timestamp,
DB_NAME(database_id) AS [DatabaseName],
   OBJECT_NAME(object_id, database_id) AS [ObjectName],
   index_id,
   hobt_id,
   sql_text,
   stackframe.value('(@level)[1]', 'int') AS [level],
   stackframe.value('xs:hexBinary(substring((@handle)[1], 3))', 'varbinary(max)') AS [handle],
   stackframe.value('(@line)[1]', 'int') AS [line],
   stackframe.value('(@offsetStart)[1]', 'int') AS [offset_start],
   stackframe.value('(@offsetEnd)[1]', 'int') AS [offset_end]
FROM
(SELECT
   XEvent.value('(@timestamp)[1]', 'datetime') AS timestamp,
   XEvent.value('(data[@name="database_id"]/value)[1]', 'int') as database_id,
   XEvent.value('(data[@name="object_id"]/value)[1]', 'bigint') as object_id,
   XEvent.value('(data[@name="index_id"]/value)[1]', 'int') AS index_id,
   XEvent.value('(data[@name="hobt_id"]/value)[1]', 'bigint') as hobt_id,
   XEvent.query('(action[@name="tsql_stack"]/value/frames/frame)[1]') AS [tsql_stack],
   XEvent.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text]
FROM
   (
      SELECT CAST(target_data AS XML) AS [target_data]
      FROM sys.dm_xe_session_targets st
      INNER JOIN sys.dm_xe_sessions s
      ON s.address = st.event_session_address
      WHERE name = N'scan_started'
      AND target_name = N'ring_buffer') as XEventsRingBuffer
   CROSS APPLY target_Data.nodes('/RingBufferTarget/event') AS XEventData (XEvent)
    ) AS XEventResults (timestamp, database_id, object_id, index_id, hobt_id, tsql_stack, sql_text)
CROSS APPLY tsql_stack.nodes ('(frame)') AS stack(stackframe)
)
SELECT
   XE.DatabaseName,
   XE.sql_text AS [SQL Text],
   dest.text AS [Frame Text],
   COUNT(*) AS [Execution Count]
FROM
   XEventCTE AS XE
CROSS APPLY sys.dm_exec_sql_text(XE.handle) dest
GROUP BY
   XE.DatabaseName,
   XE.sql_text,
   dest.text
ORDER BY
   COUNT(*) DESC
[/sql]

 

Which in all honesty is a bit of a monster!! So, the idea of this blog post is to show how to create this starting from scratch, as once the basics are grasped, other extended events can follow a very similar construction.

So, let’s begin with the first statement:-

[sql]
SELECT CAST(target_data AS XML) AS [target_data]
FROM sys.dm_xe_session_targets st
INNER JOIN sys.dm_xe_sessions s
ON s.address = st.event_session_address
WHERE name = N'scan_started'
AND target_name = N'ring_buffer'
[/sql]

 

If you look back at the original Extended Event statement, I use a target for capturing the data of:

[sql]
ADD TARGET package0.ring_buffer
[/sql]

 

which is created with the name of the XEvent:-

[sql]
CREATE EVENT SESSION scan_started ON SERVER
[/sql]

 

So my XML data for the range_scan events is held in a ring_buffer with a name of ‘scan_started’…

And we cast the target_data within the ring_buffer as an XML data type. This makes the data available, and within SSMS, I can even click on the resulting XML to see the results above. Ok, so that’s great, but I find it much easier to digest if the information is returned in a tabular format, so I want to shred the XML down into table form. Now, this is where the fun (confusion?) begins. How to I extract the values I would like to see from the XML representation…

To get at the actual data values within the XML, I need to identify which XML element I am interested in, and then specify the value to be returned and what data type it is. So, as an example, lets assume I want to pull the object_id from the following XML construct:-

image

 

We can see that the XML root element is called ‘RingBufferTarget’, and the first element within it is called ‘event’. Since EVERY element and value falls below this structure, it is often useful to move the entry point of examining the XML to the ‘/RingBufferTarget/Event’ position and start to access the ‘data’ nodes from there. As such, the XML can start to be deconstructed:

[sql]

FROM
(
SELECT CAST(target_data AS XML) AS [target_data]
FROM sys.dm_xe_session_targets st
INNER JOIN sys.dm_xe_sessions s
ON s.address = st.event_session_address
WHERE name = N'scan_started'
AND target_name = N'ring_buffer'
) as XEventsRingBuffer
CROSS APPLY target_Data.nodes('/RingBufferTarget/event') AS XEventData (XEvent)
[/sql]

 

Here the CROSS APPLY is used to extract a new XML fragment from the original, containing only the data nodes. This new XML fragment is contained within the ‘XEvent’ variable. Now, to get at the actual values within the XEvent variable, I need to specify the following:-

[sql]
XEvent.value('(data[@name="object_id"]/value)[1]', 'bigint') as object_id,
[/sql]

 

This looks within the XEvent XML fragment for a value (XEvent.value) which is contained within the data nodes, having a @name of “object_id” and returned the value from the underlying “value” field. This value is expected to have a datatype of ‘bigint’ and is pulled out from the XML and returned back in table format with a column name alias of “object_id”. The exact same procedure is used for all of the other values specified. This then builds up to the following:-

[sql]
SELECT
XEvent.value('(@timestamp)[1]', 'datetime') AS timestamp,
XEvent.value('(data[@name="database_id"]/value)[1]', 'int') as database_id,
XEvent.value('(data[@name="object_id"]/value)[1]', 'bigint') as object_id,
XEvent.value('(data[@name="index_id"]/value)[1]', 'int') AS index_id,
XEvent.value('(data[@name="hobt_id"]/value)[1]', 'bigint') as hobt_id,
XEvent.query('(action[@name="tsql_stack"]/value/frames/frame)[1]') AS [tsql_stack],
XEvent.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text]
FROM
(
   SELECT CAST(target_data AS XML) AS [target_data]
   FROM sys.dm_xe_session_targets st
   INNER JOIN sys.dm_xe_sessions s
   ON s.address = st.event_session_address
   WHERE name = N'scan_started'
   AND target_name = N'ring_buffer'
) as XEventsRingBuffer
CROSS APPLY target_Data.nodes('/RingBufferTarget/event') AS XEventData (XEvent))
[/sql]

 

The tsql_stack is the only column which is handled slightly differently, and this contains a XEvent.query rather than an XEvent.value. In this situation, we are returning a new XML fragment (using the XEvent.query) such that the resulting data contains 5 values, then the XML fragment for the tsql_stack, and finally the sql_text column. Now we can expose these results as a new table using this :-

[sql]
AS XEventResults (timestamp, database_id, object_id, index_id, hobt_id, tsql_stack, sql_text)
[/sql]

 

Now, we can use the same process on the tsql_stack XML fragment:-

[sql]
SELECT
   timestamp,
   DB_NAME(database_id) AS [DatabaseName],
   OBJECT_NAME(object_id, database_id) AS [ObjectName],
   index_id,
   hobt_id,
   sql_text,
   stackframe.value('(@level)[1]', 'int') AS [level],
   stackframe.value('xs:hexBinary(substring((@handle)[1], 3))', 'varbinary(max)') AS [handle],
   stackframe.value('(@line)[1]', 'int') AS [line],
   stackframe.value('(@offsetStart)[1]', 'int') AS [offset_start],
   stackframe.value('(@offsetEnd)[1]', 'int') AS [offset_end]
FROM
(
    SELECT
      XEvent.value('(@timestamp)[1]', 'datetime') AS timestamp,
      XEvent.value('(data[@name="database_id"]/value)[1]', 'int') as database_id,
      XEvent.value('(data[@name="object_id"]/value)[1]', 'bigint') as object_id,
      XEvent.value('(data[@name="index_id"]/value)[1]', 'int') AS index_id,
      XEvent.value('(data[@name="hobt_id"]/value)[1]', 'bigint') as hobt_id,
      XEvent.query('(action[@name="tsql_stack"]/value/frames/frame)[1]') AS [tsql_stack],
      XEvent.value('(action[@name="sql_text"]/value)[1]', 'varchar(max)') AS [sql_text]
   FROM
   (
       SELECT CAST(target_data AS XML) AS [target_data]
      FROM sys.dm_xe_session_targets st
      INNER JOIN sys.dm_xe_sessions s
      ON s.address = st.event_session_address
      WHERE name = N'scan_started'
      AND target_name = N'ring_buffer'
   ) as XEventsRingBuffer
   CROSS APPLY target_Data.nodes('/RingBufferTarget/event') AS XEventData (XEvent)
) AS XEventResults (timestamp, database_id, object_id, index_id, hobt_id, tsql_stack, sql_text)
CROSS APPLY tsql_stack.nodes ('(frame)') AS stack(stackframe)
[/sql]

 

Here we CROSS APPLY on the tsql_stack.nodes, starting at the ‘frame’ position, and return that new XML fragment as a variable called “stackframe”. Then I can use the stackframe.value construct to pull out all of the values (as before) and return them as a new CTE called:-

[sql]
XEventCTE (timestamp, DatabaseName, ObjectName, index_id, hobt_id, sql_text, level, handle, line, offset_start, offset_end)
[/sql]

 

I could just as easily used another derived table instead of a CTE – I’ll leave that choice completely up to the reader and your personal preference.

At this point, my target_data is now fully shredded and available in tabular form. I can now use regular t-sql to group and order the results.

[sql]
SELECT
   XE.DatabaseName,
   XE.sql_text AS [SQL Text],
   dest.text AS [Frame Text],
   COUNT(*) AS [Execution Count]
FROM
   XEventCTE AS XE
CROSS APPLY sys.dm_exec_sql_text(XE.handle) dest
GROUP BY
   XE.DatabaseName,
   XE.sql_text,
   dest.text
ORDER BY
   COUNT(*) DESC
[/sql]

So as a final step, I pull the database, sql_text, t-sql handle from the frame and execution count from the CTE (my shredded XML), CROSS APPLY'd them with the dm_exec_sql_text DMF, and finally group them up. This should then show me the t-sql of the statements which are doing range scans, grouped by how frequently they are present on the system (this is also sometimes referred to as “death by a thousand cuts”). There are of course queries which execute only a small number of times, but perform huge range_scans which may have more impact on the system. However, that is not the purpose of this blog post – more a walk-through of how to construct (or rather de-construct) XML results being returned from an Extended Event.