FileStream Data and Deletion

Deletion on computers is often a bit of a sham, as I found out when I got in at the ground floor of forensic computing a long long time ago.  Back then file systems were very lazy so when you deleted something all that happened was the file system marked the blocks occupied by that file as being available, in its catalogue.The process didn’t actually change the blocks themselves, that was left until the the blocks were next reused by another file.  This meant I could snoop around and recover them and if necessary produce them in evidence.

I mention this because the new SQL Server FileStream (sort of) data type works in much the same kind of way. 

Note: For those that don’t know FileStream allows for large files to be stored in a special filegroups so that for backup and transactions they behave as part of the database. However they actually still exist as individual files (pictures , videos CAD drawings or whatever) on the file system albeit in SQL Server controlled paths and with generated names as part of that special filegroup. This means they can also be streamed in and out of SQL Server nearly as quickly as though they were ordinary files.

If you delete a row with a FileStream column the row itself gets deleted, but the FileStream file seems to stay behind even after the transaction is committed.  That can’t be good otherwise you’d soon fill up your hard disk and you’d have all those orphaned files lying around which might be a security risk.

The answer is garbage collection, which unlike in SharePoint is an automatic background process that follows on behind checkpoints,  for more information check Books Online (written by one Paul Randall!).

The other interesting thing I picked up here is that virus checkers could check these raw files (because they will still have their signatures in them but just have odd names and paths).  That’s a good thing for security, but the virus checker will have permissions to delete infected files than it is going to start punching holes in your referential integrity so it might be best to mark them as quarantined and then look at the anti-virus log and DBCC CheckDB can be used to work out what’s missing so you can clear up the mess.

While I am on the subject FileStream data cannot be compressed with SQL Server compression and Transparent Data Encryption doesn’t apply to this data, rather you are encouraged to use the Encrypted Files System (EFS).

Technorati Tags: EFS,SQL Server 2008,filestream,garbage collection,Paul Randall