Today we have a guest post from one of our excellent Premier Field Engineers from the United States – Susan Van Eyck.
In this post Susan explains the pitfalls of comparing different data types in a where clause and how it can hurt performance.
This post is about implicit conversions, or more accurately, it’s about which ones are hurting your query performance.
For SQL Server to compare two values (e.g., a column and literal value as in WHERE b.Status = N'A') both values need to be of the same data type. If they differ, the one having the lower data type precedence will be (implicitly) converted to match that of the other value. Practically speaking, the rules of data type precedence are about not losing information.
Imagine a comparison between an Integer and Decimal value . If the Decimal was converted to an Integer, data could be lost (123.45 >> 123). The same holds true when comparing an NVARCHAR (Unicode data types can store 64,000 different characters) and VARCHAR (only 256 distinct characters) strings as in the code snippet above. There are many NVARCHAR values that simply can’t be represented as VARCHAR values, so the implicit conversion must go from VARCHAR to NVARCHAR.
Where we get into trouble is when the leading key value in one of our carefully crafted indexes gets implicitly converted to another data type. Again referring to the code snippet above, let’s say that b.Status is a CHAR(1) column and that we’ve created an index to support the query (CREATE INDEX ix_Status ON dbo.SomeTable (Status);). The value we're searching for is an NVARCHAR (as indicated by its "N" prefix) which has a higher data precedence than CHAR so the values in Status must be converted to NVARCHAR before SQL Server can compare them with N'A'. The net result of this is that we can no longer use our lovely index.
Let's compare the execution plans for this pair of queries where [Status] is a CHAR(1) and is indexed:
SELECT COUNT(*) FROM dbo.SomeTable WHERE [Status] = 'A';
SELECT COUNT(*) FROM dbo.SomeTable WHERE [Status] = N'A';
The plan for the first query uses the index (Index Seek) and is about 5X less costly to execute than the second query. This is because the second query requires that [Status] be converted to an NVARCHAR which means we can't use our index so have to scan the table instead. The good news is that SQL Server tells us something is wrong. Note the warning symbol on the SELECT operator in the second plan. Looking at the operator's properties you'll see a warning that there was an implicit conversion which may impact our ability to perform an Index Seek
But – not all implicit conversions interfere with Index Seeks, and I wanted to know which ones were problematic and which ones weren’t so I went through an exercise to find out, the final result set of which looks something like this:
The row headers represent our indexed columns' data types and the column headers represent the various data types we might try comparing them against. The outcomes are color coded – Green (and NULL) where SQL can perform and index seek, Yellow where we fall back to Scans (Nonclustered Index Scans if we’re lucky, Clustered Index/Table Scans if we're not) and Red for illegal comparisons where an implicit conversion can’t be made.
Note that some “legal” conversions may also fail. For example, we can convert “10” to an integer, but not “XYZ”!