Database Programming: The Oldest Performance Trick In The Book

Another conversation among colleagues leads to another blog post. May the circle remain unbroken, indeed..

Consider the following schema:

CREATE

TABLE dbo.Info (
[Id] char(20),
[Key] varchar(10),
[Value] nvarchar(20)
)

CREATE

UNIQUE CLUSTERED INDEX InfoIndex1 ON dbo.Info([Id] ASC, [Key] ASC)

A performance problem was noted with this query:

SELECT

[Value] from [dbo].[Info] WHERE [Id] = N'xxxxxxxx' AND [Key] = N'xxxxxxxx'

For a table with about 100,000 rows, where each unique Id has 1000 keys, we see about 1200 reads and it takes about 50 ms. The query plan shows an Index Scan. Statistics correctly predict the number of returned rows.

The issue here is what's known as an "implicit conversion." The Id and Key columns are defined as single-byte character columns, but the character literals in the SELECT statement are UNICODE (double-byte) literals. This inconsistency in our search arguments means that SQL Server cannot perform an exact match against either the index or the base column. It therefore takes the smallest available structure (the index with the column in it is still smaller than the table with the column in it) and scans it, implicitly converting each single-byte column to its double-byte UNICODE equivalent.

If the datatypes of the column and the constant are the same -- if the column is converted to UNICODE, or if the character literals in the SELECT statement are converted to single-byte strings by removing the N designator -- then we'll get an INDEX SEEK in the query plan, with a concomitant increase in performance.

So, while implicit conversion looks like a good thing on the surface, it's actually a major bane to performance. Indeed, this sort of issue is among the easiest to remedy in your code. Just make sure that the datatypes of references on both sides of a search argument are precisely the same.

-wp