Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Changing Dimensions – Fast or Slow?

Many resources on data warehousing talk about slowly changing dimensions and how to deal with them but what happens when your dimensions change more quickly and what is does fast or quick mean in in this context?

First let’s be clear on what is meant by slowly changing dimensions.  Any change to an attribute in a dimension typically happens far less often than changes are made to fact tables. Think of changing product specifications, customer moving address, and organisational change in a business, all of these happen much more rarely than facts are added or changed in a fact table.

However there are some changes that happen more quickly than others, for example the current stage of a project or process could change daily or even hourly, as could the status of a customer complaint.

Then there is the matter of how changes to dimensions should be handled.  If history is to be discarded and only the current view of the project etc. is to be kept then this is referred to as type 1 ( check my post here for more on this).  This entails doing an update which can be really slow especially if there are lots of them.  If history is to be kept then (Type 2) the processing is more complex and you end writing a row in for each time a dimension changes.  So if your dimensions are changing a lot then the refresh process is going to be very slow and your dimension are going to be very large.

What can be done to stop this from happening?  My recommendation would be to change the design by:

1. Taking out the fast changing attribute (for example project status) and creating a dimension with all of the possible values in.

2. Creating a factless fact table to record the changes with the following attributes

  • ProjectID a foreign key to the project
  • StatusID a foreign key to the status dimension in point 1. above.
  • DateChangedID foreign key to the time dimension to represent the dat the status changed.
  • Optionally if the stages always go in a set order you could have  fact(s) of days to complete and/or cost to complete so you can measure the performance of each stage of the project.

What we have lost in the process of doing this is a complete readout of the project in one simple table which makes the job of saying what is the current stage of my project a bit harder to find out. Also if project status is not null (and in my designs no dimension attributes are allowed to be null), then you need to create an initial row in the new fact table every time you create a new project to point to the initial status of the project.

Hopefully this is useful as there isn’t too much on TechNet about designing a data warehouse. If you do want to know more check the Microsoft data warehousing books in my book list by Ralph Kimball, as this is pretty much what all of the tools in SQL Server are designed around (especially integration and reporting services).