Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Using Full Text Searching SQL Server 2008

I have now got SQL Server 2008 RC0 running in my Hyper-V 64-bit virtual machine, complete with the sample adventure works database from connect. Now it’s time to have a look at querying full text indexes. One of the common ways of doing this is to use the contains function as part of a where clause and combine this with a filter on the structured e.g

SELECT    
    PM.Name AS ModelName,
    PC.Name AS Culture,
    P.Name AS ProductName,
    PD.Description,
    P.ProductNumber
FROM        
    Production.ProductModel PM
INNER JOIN  
    Production.ProductModelProductDescriptionCulture PDC
    ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
    Production.ProductDescription PD
    ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
    Production.Product P
    ON PM.ProductModelID = P.ProductModelID
INNER JOIN
    Production.Culture PC
    ON PDC.CultureID = PC.CultureID
WHERE    
    CONTAINS(PD.Description, ‘Performance’)
    AND (P.ProductNumber LIKE N’FR%’)

One thing to note about this query is that it will be much faster in SQL Server 2008 as it will yield a single execution plan:

image

with a table valued function for the full text match. BTW the issues I was having getting Full text to work in CTP6 have gone away and all of this just worked after install.

The contains function can support all sorts of searching operators for proximity matching, applying different weight to different search terms as well as the ability to search from a list of columns in the same table, provided they are in the same language i.e. they have been indexed with the same language.  Check Books On line for more information here

The same syntax is also good if the column you’re searching is actually a document in Varbinary(MAX), and the new Filestream data type.  So if I index the documents table (as I did on Thursday ) then I can run a query like this…

SELECT    
FileName,
[Document]
FROM        
    Production.[Document]
WHERE
    CONTAINS(Production.[Document].[Document], ‘red’)

and get back this

image

of course you will want to write some code to stream that document back to the user in a more readable form, but the point is that the search has got inside the document and realised it contains the word ‘red’.

If you want to search across multiple tables for a term then your query will look more  like this (BTW I setup the product table for full text searching before running this):

DECLARE @SearchTerm Nvarchar(50) = N’road’
SELECT    
    PM.Name AS ModelName,
    PC.Name AS Culture,
    P.Name AS ProductName,
    PD.Description,
    P.ProductNumber
FROM        
    Production.ProductModel PM
INNER JOIN  
    Production.ProductModelProductDescriptionCulture PDC
    ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
    Production.ProductDescription PD
    ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
    Production.Product P
    ON PM.ProductModelID = P.ProductModelID
INNER JOIN
    Production.Culture PC
    ON PDC.CultureID = PC.CultureID
WHERE    
    CONTAINS(PD.Description , @SearchTerm )
OR
    CONTAINS((P.Name, P.ProductLine) , @SearchTerm )

I have highlighted the multiple column clause here in orange , as  I included both of these columns when I setup full text indexing ion the Product table.

The query plan for this statement looks like this:

image

This is not such a good performance story and books on line recommends here that you don’t have multiple contains statements because of this.  In this example we are stuffed as we are full text searching across multiple tables.  If you need to do alot of full text searching on a large index , then design your database with full text searching in mind i.e. try and keep the number of tables that you will want to combine in a text search to a minimum. If you think about this it’s no different from using T-SQL to search lots of columns for the same thing, and so this limitation is not as much of a constraint as it initially appears.

So contains is no substitute for good database design it’s just another tool, to quote Abraham Maslow…

“If the only tool you have is a hammer, then you tend to see every problem as a nail”.