Semantic Search new for SQL Server Denali adds extra intelligence to the FullText search capability that has existed in SQL Server for several releases, by looking at the relationships between words and in and across each document. For example you can find which documents are similar to a specified document, and this could help with matching candidates’ CVs to job adverts, or you can find the key words and phrases in a given set of documents.
This statistical analysis information is stored in a special database (which is supplied but not installed in SQL Server Denali) and is refreshed as part of the process that update the FullText indexes it depends on.
Once it is setup you have access to three tabular functions:
- SEMANTICKEYPHRASETABLE is used to find the key phrases in a table or the documents which contain a key phrase.
- SEMANTICSIMILARITYDETAILSTABLE find documents that are similar to a document you specify
- SEMANTICSIMILARITYDETAILSTABLE shows the key phrase that two specified documents have in common.
Note that in CTP3 only words not phrases work in these functions and that currently doesn’t understand word stems e.g. that cloudy and clouds are derivations of the term cloud and treats each as separate words. This could be fixed as Denali gets nearer to release but there is no guarantee of this.
There are examples of this in the adventure works database which you can try, but I wanted to see what it would do with my FileTable of PowerPoint presentations I created in my last three blog posts, and have a short screen cast on this ….
and the 4 FileTable screencasts in this series are now linked (hyperlinks at the end of each screencast) so you can hop through the series.
Here are some notes if you want to install semantic search:
- You’ll need to have installed SQL Server Denali ctp3 with the FullText option.
- Having done that you’ll need to install the SemanticLanguageDatabase.msi in the SQL install media (there are x86 and x64 versions).
- This database then needs to be attached to SQL Server
- It is then registered with this statement:
EXEC sp_fulltext_semantic_language_statistics_db @dbname = N'Semanticsdb'
- You’ll need to add FullText indexing on the FileTables (or any other table) the extra semantic_statistics clause:
CREATE FULLTEXT INDEX ON dbo.MyDecks (name LANGUAGE 2057 STATISTICAL_SEMANTICS, file_stream TYPE COLUMN file_type LANGUAGE 2057 STATISTICAL_SEMANTICS) KEY INDEX PK_MyDecks_5A5B77D56E0F5D8F ON DeepFat_Catalog WITH CHANGE_TRACKING AUTO, STOPLIST = SYSTEM;