SQL Azure troubleshooting, some ideas and scenarios.

Article
04/19/2017

While I was working in a customer request about SQL Azure troubleshooting, I noticed sometimes when we compare the flexibility of SQL on premises tools and SQL Azure, we may feel we are kind of limited.

The purpose of this post it is demonstrate something about SQL Azure troubleshooting, at least considering what is available at this moment. I mentioned “something” because the tendency is an increase of available tools for troubleshooting in SQL Azure. So, for now, a few ideas and few scenarios.

So:

1) Case 1: How troubleshooting SQL Azure without SQL Profiler to identify production performance issues over a period.

The answer is Extended Events.

But, there are a few differences between extended event on premises and in the cloud.

First let’s describe the options that you could use to store the data, afterwards you must define a target for your extended event and “watch alive” is not an option available for extended event using SQL Azure. Once you create your extended event you then define a target:

Ring Buffer target - Briefly holds event data in memory.

Event Counter target - Counts all events that occur during an extended events session.

Event File target - Writes complete buffers to an Azure Storage container.

For demo purpose in this post, I will use as a Ring Buffer Target. In terms of Extended Event creation, you could code (T-SQL) or use extended event wizard. We will code!

Ref: /en-us/azure/sql-database/sql-database-xevent-db-diff-from-svr

Demo T-SQL script Extended Event:

[sql]

DROP EVENT SESSION [template_query_alive] ON Database

CREATE EVENT SESSION [template_query_alive] ON Database --use on database instead of on server.

ADD EVENT sqlserver.query_post_execution_showplan(
ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.rpc_starting(
ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.sp_statement_completed(
ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.sp_statement_starting(
ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.sql_batch_completed(
ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.username)),
ADD EVENT sqlserver.sql_batch_starting(
ACTION(sqlserver.client_app_name,sqlserver.sql_text,sqlserver.username))
ADD TARGET
package0.ring_buffer
(SET
max_memory = 500 -- Units of KB.
);

[/sql]

[sql]

ALTER EVENT SESSION template_query_alive
ON DATABASE
STATE = START;

[/sql]

As you can see in the script, I highlight one difference between on premises and SQL Azure. SQL Azure demands that you create a session using on Database.

Ok. Fine! Extended event successful implemented, but as the “watch alive” option is not really an option. You could be possibly thinking: “How may I monitor the event?” One obvious solution is, query.

Demo T-SQL Script to monitor the extended event:

[sql]
SELECT
CAST(st.target_data AS XML)AS [target_data]
FROM sys.dm_xe_database_session_event_actions AS ac
INNER JOIN sys.dm_xe_database_session_events AS ev
ON ev.event_name = ac.event_name
AND CAST(ev.event_session_address AS BINARY(8)) = CAST(ac.event_session_address AS BINARY(8))

INNER JOIN sys.dm_xe_database_session_object_columns AS oc
ON CAST(oc.event_session_address AS BINARY(8)) = CAST(ac.event_session_address AS BINARY(8))

INNER JOIN sys.dm_xe_database_session_targets AS st
ON CAST(st.event_session_address AS BINARY(8)) = CAST(ac.event_session_address AS BINARY(8))

INNER JOIN sys.dm_xe_database_sessions AS se
ON CAST(ac.event_session_address AS BINARY(8)) = CAST(se.address AS BINARY(8))
WHERE
se.name = 'template_query_alive'
AND ac.action_name = 'sql_text'
[/sql]

Ref: https://www.sqlskills.com/blogs/jonathan/understanding-the-sql_text-action-in-extended-events/

and

/en-us/azure/sql-database/sql-database-xevent-code-ring-buffer

2) Case 2: How may I found the worst queries of my SQL Azure environment?

The answer is DMV or Query store.

You could use sys.dm_exec_query_stats, sys.dm_exec_sql_text to monitor your environment. These DMVs and others are available on premises and in the cloud.

Demo T-SQL Script queries:

[sql]

SELECT TOP 5 query_stats.query_hash AS "Query Hash",
SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
MIN(query_stats.statement_text) AS "Statement Text"
FROM
(

SELECT QS.*,
SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
( (

CASE statement_end_offset
WHEN -1 THEN DATALENGTH(ST.text)
ELSE QS.statement_end_offset
END
- QS.statement_start_offset)/2) + 1
) AS statement_text

FROM sys.dm_exec_query_stats AS QS
CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST
) AS query_stats

GROUP BY query_stats.query_hash
ORDER BY 2 DESC;

[/sql]

Another interesting approach for monitoring SQL Azure is CPU. DTU consumption is CPU, memory and IO. So, if you want to monitor CPU consumption per database, you could use the new DMV, sys.resource_stats.

Demo T-SQL Script CPU:

[sql]

DECLARE @s datetime;
DECLARE @e datetime;

SET @s= DateAdd(d,-7,GetUTCDate());
SET @e= GETUTCDATE();

SELECT database_name
,dtu_limit
, AVG(avg_cpu_percent) AS Average_Compute_Utilization
FROM sys.resource_stats
WHERE start_time BETWEEN @s AND @e
GROUP BY database_name , dtu_limit

[/sql]

You could also be interested in monitoring connection that were established in a specific Azure SQL Database. For this you can use sys.dm_exec_connections. In addition, the sys.dm_exec_sessions view is helpful when retrieving information about all active user connections and internal tasks. The following query retrieves information on the current connection:

Demo T-SQL Script monitoring Connection:

[sql]

SELECT
c.session_id
, c.net_transport
, c.encrypt_option
, c.auth_scheme
, s.host_name
, s.program_name
, s.client_interface_name
, s.login_name
, s.nt_domain
, s.nt_user_name
, s.original_login_name
, c.connect_time
, s.login_time
FROM sys.dm_exec_connections AS c
JOIN sys.dm_exec_sessions AS s
ON c.session_id = s.session_id

[/sql]

Ref: https://azure.microsoft.com/en-us/documentation/articles/sql-database-monitoring-with-dmvs/

For query Store, you could use the catalog views and look for the queries over the last hour, for example:

[sql]

SELECT TOP 10 rs.avg_duration
,rs.avg_cpu_time
,rs.avg_query_max_used_memory
, qt.query_sql_text
, q.query_id
, qt.query_text_id
, p.plan_id
, GETUTCDATE() AS CurrentUTCTime
, rs.last_execution_time
FROM sys.query_store_query_text AS qt JOIN
sys.query_store_query AS q ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan AS p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats AS rs ON p.plan_id = rs.plan_id
WHERE rs.last_execution_time > DATEADD(hour, -1, GETUTCDATE())
ORDER BY rs.avg_duration DESC;

[/sql]

3) Case 3: Is it possible to track historical locking and deadlocks issues. If so, how?

The answer is, yes and we have some options for that.

Option 1: telemetry

In Azure SQL Database, we already capture deadlocks without additional action from the customer with sys.fn_xe_telemetry_blob_target_read_file.

Demo T-SQL Script telemetry blob for deadlocks

[sql]
WITH CTE AS (

SELECT CAST(event_data AS XML) AS [target_data_XML]

FROM sys.fn_xe_telemetry_blob_target_read_file('dl', null, null, null)

)

SELECT target_data_XML.value('(/event/@timestamp)[1]', 'DateTime2') AS Timestamp,

target_data_XML.query('/event/data[@name="xml_report"]/value/deadlock') AS deadlock_xml,

target_data_XML.query('/event/data[@name="database_name"]/value').value('(/value)[1]', 'nvarchar(100)') AS db_name

FROM CTE
[/sql]

ref: https://blogs.msdn.microsoft.com/azuresqlemea/2017/01/21/lesson-learned-19-how-to-obtain-the-deadlocks-of-your-azure-sql-database/

Option 2: Extended Event

Demo T-SQL Script extended event for deadlocks

[sql]

CREATE EVENT SESSION [deadlock] ON DATABASE

ADD EVENT sqlserver.database_xml_deadlock_report(

ACTION(sqlserver.client_app_name,sqlserver.database_name))

ADD TARGET package0.ring_buffer

WITH (STARTUP_STATE=ON)

[/sql]

Demo T-SQL Script to monitor deadlock extended event

[sql]

SELECT
xed.value('@timestamp', 'datetime') as Creation_Date,
xed.query('.') AS Extend_Event
FROM
(
SELECT
CAST(st.target_data AS XML) AS [target_data]
FROM sys.dm_xe_database_session_event_actions AS ac
INNER JOIN sys.dm_xe_database_session_events AS ev ON ev.event_name = ac.event_name
AND CAST(ev.event_session_address AS BINARY(8)) = CAST(ac.event_session_address AS BINARY(8))
INNER JOIN sys.dm_xe_database_session_object_columns AS oc
ON CAST(oc.event_session_address AS BINARY(8)) = CAST(ac.event_session_address AS BINARY(8))
INNER JOIN sys.dm_xe_database_session_targets AS st
ON CAST(st.event_session_address AS BINARY(8)) = CAST(ac.event_session_address AS BINARY(8))
INNER JOIN sys.dm_xe_database_sessions AS se
ON CAST(ac.event_session_address AS BINARY(8)) = CAST(se.address AS BINARY(8 ))

WHERE
se.name = 'deadlock'
) AS XML_Data

CROSS APPLY Target_Data.nodes('RingBufferTarget/event[@name="database_xml_deadlock_report"]') AS XEventData(xed)
ORDER BY Creation_Date DESC

[/sql]

Ref: https://msdn.microsoft.com/en-us/library/dn270018.aspx?f=255&MSPPError=-2147217396

4) Case 4: how can I monitor blocking scenarios?

The answer is sys . dm_tran_locks and sys.dm_os_waiting_tasks

Demo T-SQL Script to monitor blocking scenarios

[sql]

SELECT

t1.resource_type,

t1.resource_database_id,

t1.resource_associated_entity_id,

t1.request_mode,

t1.request_session_id,

t2.blocking_session_id

FROM sys.dm_tran_locks as t1

INNER JOIN sys.dm_os_waiting_tasks as t2

ON t1.lock_owner_address = t2.resource_address;

[/sql]

And that is it, some ideas how to troubleshoot in SQL Azure.

Liliam Leme

UK Data Platform PFE

SQL Azure troubleshooting, some ideas and scenarios.

Additional resources