Beyond Relational– FullText indexing with SQL Server FileTable

In my part 3 of my series on FileTable I wanted to search the documents I had saved in parts 1 & 2. This is pretty straightforward, as Full text indexing has been in several versions of SQL Server and is SQL Server 2008 R2 the FullText indexes are part of the databases they refer to. 

What caught me out was that I sort of assumed that SQL Server Denali would be aware of the Office 2010 document types and it isn’t without doing some extra steps.  To explain:  If you store documents in SQL Server be that in the varbinary(max) column of a table or using the new FileStream or really new FileTable, SQL Server needs to understand what the document is so it can open it to index it.  This is done using ifilters and this works by having another column in the table that contains your varbinary(max)  data which refers to the relevant ifilter for each document, and note: you also need to have a single column, non computed, unique, clustered index on the table in order to create a FullText index.

There is a standard management view to see what ifilters are in operation:

 select * from sys.fulltext_document_types

You can add more ifilters to the database server from Microsoft and other vendors e.g. from Firefox or Adobe for pdf.  I wanted the Office 2010 ifilters as I am using PowerPoint 2010 in my demos, but when I tried to install them they were already there, however they weren’t referenced by SQL Server.  So after installing an ifilter you need to go into SQL Server and run these commands:

 EXEC sp_fulltext_service @action='load_os_resources', @value=1;
EXEC sp_fulltext_service 'restart_all_fdhosts';

to reference the new ifilters and restart the FulllText service. you’ll also then need to restart SQL Server itself.

Anyway back to my FileTable.  FileTables have a fixed schema but do meet the requirements for FullText indexing:

  • file_stream is the varbinary(max) column that references the document itself.
  • file_type contains the file extension of the file which will correspond to the ifilter.
  • there is a unique index that meets the FullText criteria for a unique clustered  index.

So you can right click on a FileTable and run the FullText wizard as I have done in this short screen cast…

..or you can use T-SQL:

 CREATE FULLTEXT CATALOG DeepFat_CATALOG AS DEFAULT;

CREATE FULLTEXT INDEX ON dbo.MyDecks
    (name LANGUAGE 2057,
    file_stream TYPE COLUMN file_type LANGUAGE 2057)
    KEY INDEX PK_MyDecks_5A5B77D56E0F5D8F
    ON DeepFat_Catalog
    WITH
        CHANGE_TRACKING AUTO,
        STOPLIST = SYSTEM;

Note: the language code 2057 is British English,  and  the index name will be different as it’s a random guid created by SQL Server for each FileTable

The FullText index will be populated automatically unless you specify otherwise, and I didn’t so I can use it immediately like this to test that it worked:

 SELECT name
FROM dbo.MyDecks
WHERE
FREETEXT (file_stream, 'nist')
and creation_time > '2011/01/01';

and this returns the decks I created this year that have the term nist in (the National Institute for Science & Technology) . 

Next time I am going to hook up the new Semantic Search, capability in SQL Server Denali to my FileTable. In the meantime if you want to try this yourself then you’ll need SQL Server Denali ctp 3