Here are some notes on “SQL Server 2008 Database Compression” I took while attending an advanced class on SQL Server taught by Greg Low (from http://sqlblog.com/blogs/greg_low/ and http://www.sqldownunder.com/).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
- Databases are typically fairly compressible
- Things to consider: storage Cost, workload performance, backups
- Migration – when source database was already compressed
- NTFS Compression – data is not compressed in memory
SQL Server 2008 compression
- Data stored more efficiently on the page
- Can improve performance if workload is I/O bound
- Can decrease performance if workload is CPU bound
- You will end up with a lot more rows in memory, because more will fit on a page
- It’s counter-intuitive: Typically things get faster, not slower (boxes are I/O bound, not CPU bound)
- Careful – Index rebuild times will go up
PAGE and ROW
- Enterprise Edition, enable for ROW or PAGE (which includes ROW)
- ROW: Variable length encoding: numeric, char, NULLs, zeros
- Discussion: When does it make sense to use VARCHAR(2)?
- Discussion: How about NVARCHAR vs. VARCHAR?
- PAGE: Column prefix, page dictionary (in addition to what ROW does)
When does it happen
- When you enable compression, pages are not immediately compressed. REBUILD will force.
- After enabled, row compression will happened when rows ar inserted or updated.
- Page compression done before a page split (since it would be expensive). FILLFACTOR is important.
- If you enable for PAGE and don't rebuild, over time, you will end up with a mix of NONE/ROW/PAGE in the actual database
- Non-leaf pages are not compressed.
- Careful – Page compression on frequently updated tables
- BLOB: not row-compressed, can be page-compressed if in-row
- Backup: Pages go to backup as they are. Additional (?!), more aggressive (?!) compression during backup
- See http://msdn.microsoft.com/en-us/library/cc280449.aspx
Compression - How to
- Each table/index/partition can have a different compression setting
- Option in CREATE/ALTER TABLE… WITH (DATA_COMPRESSION=PAGE)
- See http://msdn.microsoft.com/en-us/library/ms190273.aspx
- Demo – Estimating compression
- Space savings depend on data distribution
- Creates a sample on tempdb to estimate
- See http://technet.microsoft.com/en-us/library/cc280574.aspx
- Look at page_compression_success_count in sys.dm_db_index_operational_stats
- See http://msdn.microsoft.com/en-us/library/ms174281.aspx
- Look at data_compression in sys.partitions
- See http://msdn.microsoft.com/en-us/library/ms175012.aspx
- Look at compressed_page_count in sys.dm_db_index_physical_stats
- Performance Counters: SQL Server: Access Methods: %compress%
Additional notes from a previous blog post:
- This feature is targeted at Data Warehouses
- It is only available for SQL Server 2008 Enterprise and Developer editions
- You cannot use it if the max row size exceeds 8060 bytes
- You cannot combine compression and sparse columns on the same table
- Storage Engine blog - http://blogs.msdn.com/sqlserverstorageengine/archive/tags/Data+Compression/default.aspx