Slowly Changing Dimensions and Surrogate Keys

You would expect there to be a record and hopefully only one record for each sales person in a typical sales system and another record in another table for each of the customers.  But what happens when a one of these sales people gets assigned to a different territory or is promoted to be a regional sales manager?  The usual answer is that the record is edited to reflect the change.

That's fine until you decide to start to look at trends over time and analyse how each person in the sales team is performing against target .  In the example above, all of the sales in the system will appear to have been made by a person in their current role, so when a report is run the answer will be distorted.  This is where a separate data warehouse can help, as choices can be made about how to deal with these kind of changes independently of the line of business system.  There are three industry standard approaches to the problem of slowly changing dimensions:

  • Type 1.  This is pretty much what happens in the line of business system - the new version of the record overwrites the old version.  This is easy to implement and for many attributes it isn't important to know the historical value e.g. the last name of the sales person might change when they get married.
  • Type 2.  In this approach there is a row added to the dimension table every time a change is made to the sales person e.g. when assigned to a different territory or on promotion.  
  • Type 3. This technique adds extra columns to each row which hold the previous version of the information so you might have [original sales territory] or [old job title] columns and that can work where only the original and latest versions of a record are needed, but you have to decide which columns to use in which situation.

The type 2 scenario requires the generation of a new unique key on the dimension as there will be multiple versions of the same source record each with the same source or business key. This new surrogate key is how the dimension is joined to the sales facts made for that version of the sales person as you can see below

 type 2 sk

For those of you in black and white, Steve's sales in E region are in orange and those when he moved to SW region are in blue.  We can sum and group a join of these 2 tables by [Name] to see all Steve's sales or we can do this for [Territory] to get a true picture of sales in either scenario. 

Note that the business key doesn't appear in the fact table and that the surrogate key is just an arbitrary number -  usually a sequence number incremented as new rows are added to the dimension.  It is also good practice to add  extra columns such as [Is Current], above  to mark which record is the current one. Alternatively  [effective from date] and [effective to date] columns can be used to show when the row was in force.

So using type 2 slowly changing dimensions is the most work but the most versatile approach, and the good news is that there is a wizard  for this in SQL Server 2005 Integration services (SSIS).

Technorati Tags: Data Mart, Data warehouse, BI, Business Intelligence, Slowly Changing Dimensions, Surrogate Keys