SQL Server Advent Calendar 23 - Filestream

Day 23 of my virtual advent calendar, about stuff I like in SQL Server 2008..

Filestream isn’t one of the new data types in SQL Server 2008 it a variant of varbinary(MAX), and it is Microsoft’s solution to the problem of storing large unstructured data as part of a database. 

In SQL Server 2005/8  varbinary(max) can store a file of up to 2Gb (a limitation inherited by SharePoint)  and as this limit is approached the performance of streaming this data back to a client becomes worse and worse.  On the other hand referential integrity is easy to maintain and backups disaster recovery is straightforward as all that’s needed is to restore the database.

The other common approach to this problem is to have a table with pointers to the file locations of the documents on disk.  This is quite hard to write and implement and referential integrity and backup are also difficult as it’s important to have the right file at the end of each pointer and not to loose any on the way. However it does have the advantage of performance over internal storage of the data in the database. 

Filestream is the best of both worlds.  Documents are stored as files on disk, but they are in a special SQL Server filegroup.  When a request for a file is made SQL resolves this into a filehandle that can be use by ordinary win32api’s so performance is as good as it can be.  However because all of this is in a filegroup backup is also straightforward. This mechanism also respects transactions so if you rollback you get back to the previous version of the file. A word of warning here is that this can involve some serious disk usage if the files are large as internally SQL Server keeps the older version of the file =until the transaction is committed.

Using Filestream is a developer task , so any example I put here will be littered with very dodgy C#. Rather than do that I will instead point you to this excellent  example developed by one of my French (it is in English) colleagues Patrick Guimonet, and it also shows Powershell integration in action!

Finally Filestream is included in all editions of SQL Server while not being part of size limitations in these editions.

 

Technorati Tags: SQL Server 2008,Filestream,varbinary,unstructured data