Update/Clarification: I Might Start To, Like, Like LIKE..

Article
10/31/2007

.. but it's got to have a COLLATE with it.

A couple of updates are necessary to my recent post on LIKE, PATINDEX, and COLLATE. A bit of this is covered in the comments to the original post, but I'd like to tie it all together and make sure that it's elevated above the mudpit. Thanks as always to Adam Machanic for his insights.

The first point which was obscured in my original post was that the 2.5x performance improvement I observed was not a result of using PATINDEX instead of LIKE. Rather, the performance improvement stems from use of the binary collation. LIKE and PATINDEX offer essentially identical performance, with or without the binary collation. They both just perform better with it.

PATINDEX does offer one benefit over LIKE, which I neglected to call out: it returns the location in the target string where the search string was found, as opposed to the LIKE statement's simple binary return. This feature has been very useful to me in the work I've been doing which fostered the post; this capability is why I favor PATINDEX over LIKE. I should've made that point clearer in the original post.

The second point which needs to be made is that use of the binary collation will make your comparison case-sensitive regardless of the case sensitivity of the default collation on the instance on which the code is running. Therefore, all comparisons using this technique must be coded as case-sensitive.

There are two basic options for doing this; the method you employ is simply a style point. Either..

SELECT data

FROM mytable

WHERE patindex('%value%',lower(column COLLATE Latin1_General_BIN)) > 0

..or..

SELECT data

FROM mytable

WHERE patindex('%VALUE%',upper(column COLLATE Latin1_General_BIN)) > 0

In these queries, we are explicitly running a lower-case and upper-case case-sensitive query, respectively. The results of each query will be identical. Note that if we don't ensure that both arguments of the PATINDEX function are similarly cased (either upper or lower, makes no difference), we'll get false failures in our comparison and data will erroneously fall out of our result set.

You have my apologies for the ambiguities in the original post. With these clarifications, I hope that the power, utility, and limitations of this method are clear.

-wp

Update/Clarification: I Might Start To, Like, Like LIKE..

Additional resources