SQL Server Advent Calendar 16 – Change Data Capture

Day 16 of my virtual advent calendar, about stuff I like in SQL Server 2008..

One of the main hurdles in refreshing a data warehouse is to work out what’s changed since the last refresh.  If you’re lucky the source will have a last modified date and if you’re really lucky this is actually populated and working.  There is a capability in SQL Server 2008 to help as well, Changed Data Capture (CDC), but be aware this is an enterprise edition only feature.

What it does is to create a separate set of tables in the same database (but different schema) as the source.  It needs to be turned on first like this..

USE AdventureWorks
EXEC sys.sp_cdc_enable_db

BTW there is a new column in the sys.databases dmv called is_cdc_enabled so you can check if CDC is enabled with…

SELECT name, is_cdc_enabled FROM sys.databases

Now you can enable a table for CDC..

EXEC sys.sp_cdc_enable_table
@Source_Schema = N'HumanResources',
@Source_Name = N'Employee',
@Captured_Column_List =
N'EmployeeID,NationalIDNumber, ManagerID, Title, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag, VacationHours, SickLeaveHours, CurrentFlag',
@Supports_Net_Changes = 1,
@Role_Name = 'CDC_Admin'

Hopefully all but the last 2 arguments are pretty obvious.  Supports Net changes is a bit flag which either shows you only the overall change between two periods as opposed all of the changes .  The role name is a database role which runs the code and a new one is created if you specify a new name.  Also you can specify other parameters here like @filegroup.

Note you don’t have to include the captured column list (it will track all columns by default, but if you do you must include the primary key or another argument @index_name to refer to a unique index on the table if @supports_net_changes is set to 1.

You can now check this has been enabled..

EXECUTE sys.sp_cdc_help_change_data_capture
@source_schema = N'HumanResources',
@source_name = N'Employee';

Behind the scenes a new system table has been created to track changes, [cdc].[HumanResources_Employee_CT] which has got all the columns we specified we wanted to track plus 5 extra ones ..

[__$start_lsn]
,[__$end_lsn]
,[__$seqval]
,[__$operation]
,[__$update_mask]

so lets make a simple change and see what happens in this table

UPDATE HumanResources.Employee
SET Title = 'DBA' where Title = 'Database Administrator'

and then query the change table like this..

SELECT TOP 1000 [__$start_lsn]
,[__$end_lsn]
,[__$seqval]
,[__$operation]
,[__$update_mask]
,[EmployeeID]
,[Title]

  FROM [AdventureWorks].[cdc].[HumanResources_Employee_CT]

to see what was captured.

image

The LSN columns show the log sequence number of the commit for the change and the -$seqval orders the sequence of changes that can occur in the same transaction, so a possible lower level of detail. _$operation shows what happened to the row, 1 = delete, 2 = insert, 3 = update (before change), and 4 = update (after change).

To help you use this table there are functions to get the min and max LSN’s in the table.

You should also notice that you have got 2 new SQL Agent jobs to capture and cleanup the tables.  Rather than modify these jobs directly there are also specific functions to manage them.

I have skimmed over this to give you an idea of the various parts of CDC, so you should check books on line for a fuller explanation here, and there is also a CDC TechNet virtual lab here, which you can try, before you have a go in your own environment.

Technorati Tags: sql server 2008,change data capture,cdc