Day 16 of my virtual advent calendar, about stuff I like in SQL Server 2008..
One of the main hurdles in refreshing a data warehouse is to work out what’s changed since the last refresh. If you’re lucky the source will have a last modified date and if you’re really lucky this is actually populated and working. There is a capability in SQL Server 2008 to help as well, Changed Data Capture (CDC), but be aware this is an enterprise edition only feature.
What it does is to create a separate set of tables in the same database (but different schema) as the source. It needs to be turned on first like this..
BTW there is a new column in the sys.databases dmv called is_cdc_enabled so you can check if CDC is enabled with…
SELECT name, is_cdc_enabled FROM sys.databases
Now you can enable a table for CDC..
@Source_Schema = N’HumanResources’,
@Source_Name = N’Employee’,
N’EmployeeID,NationalIDNumber, ManagerID, Title, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag, VacationHours, SickLeaveHours, CurrentFlag’,
@Supports_Net_Changes = 1,
@Role_Name = ‘CDC_Admin’
Hopefully all but the last 2 arguments are pretty obvious. Supports Net changes is a bit flag which either shows you only the overall change between two periods as opposed all of the changes . The role name is a database role which runs the code and a new one is created if you specify a new name. Also you can specify other parameters here like @filegroup.
Note you don’t have to include the captured column list (it will track all columns by default, but if you do you must include the primary key or another argument @index_name to refer to a unique index on the table if @supports_net_changes is set to 1.
You can now check this has been enabled..
@source_schema = N’HumanResources’,
@source_name = N’Employee’;
Behind the scenes a new system table has been created to track changes, [cdc].[HumanResources_Employee_CT] which has got all the columns we specified we wanted to track plus 5 extra ones ..
so lets make a simple change and see what happens in this table
SET Title = ‘DBA’ where Title = ‘Database Administrator’
and then query the change table like this..
SELECT TOP 1000 [__$start_lsn]
to see what was captured.
The LSN columns show the log sequence number of the commit for the change and the -$seqval orders the sequence of changes that can occur in the same transaction, so a possible lower level of detail. _$operation shows what happened to the row, 1 = delete, 2 = insert, 3 = update (before change), and 4 = update (after change).
To help you use this table there are functions to get the min and max LSN’s in the table.
You should also notice that you have got 2 new SQL Agent jobs to capture and cleanup the tables. Rather than modify these jobs directly there are also specific functions to manage them.
I have skimmed over this to give you an idea of the various parts of CDC, so you should check books on line for a fuller explanation here, and there is also a CDC TechNet virtual lab here, which you can try, before you have a go in your own environment.