Tuning Enovia SmarTeam -- Statistics

[Prior Post in Series] [Next Post in Series]

Adding statistics helps SQL Server to determine which tables to scan first when there are joins. To understand statistics better, see Statistics Used by the Query Optimizer in Microsoft SQL Server 2005. Statistics are typically added after indexes because if there is an index on the column then there is usually not a need for a statistic.

I have run a sample trace log through Database Engine Tuning Advisor (DTA) and received recommendations which I have saved as XML.

My approach for this demonstration tuning is simple:

  • Identify all column names recommended per table.
  • Put statistics on all tables with these column names.

This produces a good solution given the light-weight trace log and does not require much analysis time.

As cited in the introduction, I use the XML results file to speed analysis. I opened one of my saved XML files and then pasted it below after doing some simple modifications.

  • Delete:
    • <?xml version="1.0" encoding="utf-16"?>
  • Change:
    • <DTAXML xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://schemas.microsoft.com/sqlserver/2004/07/dta">
    • To
    • <DTAXML>

As cited before, this avoids issues with XML namespaces from my TSQL below.

 DECLARE @Xml Xml
Set @Xml='{paste here}'
Select ColName,Count(1) FROM
(SELECT node.value('(.)[1]','sysname') as ColName FROM @Xml.nodes('//Statistics/Column') as Ref(node)) DTA
GROUP BY ColName
ORDER BY COUNT(1) DESC

Which produced results similar to those shown below.

clip_image002

The first column is the column name and the second one is the number of tables it occurs on.

The stored procedure below creates and then executes TSQL to add statistics across all tables that match the patterns identified in the DTA recommendations. These patterns are modified as follows:

  • Only one statistic is added per table
  • The columns in the statistics are added in the order that they appear in the table

This is a quick reasonable solution. If you have the time, a more detailed analysis is best.

This approach reduces the recommendations into just 16 patterns, shown below

clip_image004

The Code Solution

The TSQL below does some transformations on the XML and then creates statistics on all tables matching the pattern. The Statistics name is generated from the table name and the column's ordinal number(aka position in the table) in the statistics so the procedure may be executed as needed without duplicating statistics.

 CREATE PROC dta_Statistics @Xml Xml
AS
SET NOCOUNT ON
DECLARE @CMD nvarchar(max),@ColName nvarchar(max), @Table nvarchar(max),@LastTable nvarchar(max),@StatID nvarchar(max)
DECLARE @ColCnt int,@ColIdx int 

CREATE Table #temp(TName varchar(128),CName varchar(128), ColNo int)
INSERT INTO #temp(TName,CName)
SELECT DISTINCT 
TableName,
node2.value('.','sysname') as ColName FROM (
SELECT 
node.value('(parent::*/parent::*/parent::*/Name)[1]','sysname') as TableName, 
node.query('.') as sNode 
FROM @Xml.nodes('//Create/Statistics') as Ref(node)
) DTA 
CROSS APPLY snode.nodes('(Statistics/Column/Name)') Ref(node2)

UPDATE #Temp
    SET ColNo=Ordinal_Position
    FROM #Temp JOIN INFORMATION_SCHEMA.Columns
    ON TName=Table_Name    
    And Replace(Replace(CName,'[',''),']','')=Column_Name
    

CREATE Table #temp2(InCol varchar(max), ColCnt int, StatId varchar(max))
SET @Cmd=''
SET @StatId=''
SET @LastTable=''
SET @ColCnt=0

DECLARE PK_Cursor CURSOR FOR
Select TName,CName,ColNo FROM #temp ORDER BY TName,ColNo
OPEN PK_Cursor;
FETCH NEXT FROM PK_Cursor INTO @Table,@ColName,@ColIdx
WHILE @@FETCH_STATUS = 0
BEGIN
    IF @LastTable <> @TABLE
    BEGIN
        IF Len(@CMD) > 0
            INSERT INTO #temp2 (InCol,ColCnt,StatID) Values(@CMD,@ColCnt,@StatID)
        SET @LastTable = @TABLE
        SET @CMD=''
        SET @StatId=''
        SET @ColCnt=0
    END
    IF Len(@CMD) > 0
    BEGIN
        SET @CMD=@CMD+','
        SET @StatId=@StatId+'_'
    END
    SET @CMD=@CMD+@ColName
    SET @StatID=@StatID+Cast(@ColIdx as varchar(11))
    SET @ColCnt=@ColCnt+1
    FETCH NEXT FROM PK_Cursor INTO @Table,@ColName,@ColIdx
END
IF Len(@CMD) > 0
  INSERT INTO #temp2 (InCol,ColCnt,StatId) Values(@CMD,@ColCnt,@StatId)    
CLOSE PK_Cursor;
DEALLOCATE PK_Cursor;

CREATE Table #temp3(TName varchar(max), InCol varchar(max), StatId varchar(max) )
SET @CMD=''
DECLARE P2_Cursor CURSOR FOR
Select InCol,ColCnt,StatId FROM #temp2 ORDER BY InCol,ColCnt
OPEN P2_Cursor;
FETCH NEXT FROM P2_Cursor INTO @ColName,@ColCnt,@StatId
WHILE @@FETCH_STATUS = 0
BEGIN
    IF Len(@CMD) > 0    
        SET @CMD=@CMD+'
        UNION
        '
    SET @CMD=@CMD+'Select Table_Name,'''+@ColName +''',''STATS_''+Table_Name+''_''+'''+@StatID+'''
    FROM Information_Schema.Columns JOIN Sys.Objects ON NAME=Table_Name AND Objects.Type=''U''
    WHERE COLUMN_Name in ('+REPLACE(REPLACE(@ColName,'[',''''),']','''')+') GROUP BY TABLE_NAME HAVING Count(1)='
    +Cast(@ColCnt as varchar(11))
    
FETCH NEXT FROM P2_Cursor INTO @ColName,@ColCnt,@StatId
END
CLOSE P2_Cursor;
DEALLOCATE P2_Cursor;
CREATE TABLE #Temp4 (TName sysname,Cols nvarchar(max), StatId sysname)
INSERT INTO #Temp4 EXEC (@Cmd)

DECLARE P3_Cursor CURSOR FOR
Select TName,Cols,StatId FROM #temp4 
OPEN P3_Cursor;
FETCH NEXT FROM P3_Cursor INTO @Table,@ColName,@StatId
WHILE @@FETCH_STATUS = 0
BEGIN
    IF NOT EXISTS(SELECT 1 FROM Sys.Stats where Name=@STATID)
    BEGIN
        SET @CMD='CREATE STATISTICS ['+@STATID+'] ON ['+@Table
                +'] ('+@ColName+')'
        EXEC(@CMD)
        PRINT @CMD
    END
    FETCH NEXT FROM P3_Cursor INTO @Table,@ColName,@StatId
END
CLOSE P3_Cursor;
DEALLOCATE P3_Cursor;

I did a PRINT @CMD above so you can see the creation of the statistics code, for example:

 CREATE STATISTICS [STATS_USER_GROUP_8_9_10_11] 
       ON [USER_GROUP] ([USR_OBJECT_ID],[GRP_OBJECT_ID],[CNTX_OBJECT_ID],[CNTX_CLASS_ID])
CREATE STATISTICS [STATS_USERS_1_2_3_10] 
       ON [USERS] ([OBJECT_ID],[CLASS_ID],[LOGIN],[FIRST_NAME])

For this database, a total of 291 statistics were added.

Summary

The above solution is a good solution but is likely not perfect. Some of the statistics are candidates for further optimization. For example, consider this set:

 CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_16] 
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_50]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_51]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_54]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_57]
CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_58]

Analysis and testing would reveal if dropping these 6 indexes and replacing them with one of the following would improve performance better:

  • [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10]
  • [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_16_50_51_54_57_58]

There is a balance between the time spent improving performance and the performance gain, and common sense should be used.

A second item that I did not do was checking for indexes using the columns specified. The TSQL code above was already pushing the limit for complexity in a blog post.