Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Dimensional Modelling

This simple design, popped up in my inbox today asking for advice…


image


What you need to understand about this diagram is that there is a concatenated key (keys are in BOLD) on each of the orange dimension tables, with one part of that key being CompanyID. The exception is dCompany which is just keyed on CompanyID.  You can imagine that this is a ledger fact table for multiple sub companies in a large organisation (BTW there is a time dimension as well but I want to keep the post simple).


I can understand that each product, location and so on belongs to particular company and that there might be occasions when you might need a complete list of the locations, cost centres etc for that one company. This design allows each of the dimensions to yield that list in a simple 2 table query (by joiing directly to dCompany on CompanyID).


The fact table has no redundant columns because even CompanyID is needed to link to the company dimension.


So all I would do is add in surrogate keys, i.e. a key with no relevance to the data in the dimension which would be the primary and clustered key of each of the dimensions..


 image


 


For example in dProductGroup the ProductSK of 1 is just a made up number – it might refer to CompanyID4, ProductID20, rather than choosing ProductSK on some transformation of the two business keys.  This will make queries faster and speed up cube processing if you plan to put analysis services on top of this.  In my projects I nearly always use surrogate keys and I simply use the identity property on the column that will be the surrogate key on the dimension to do this, reserving 0 for a an unknown row e.g. unknown Location.


Whether or not it’s worth doing this is going to depend on 2 factors:



  • The shelf life of the solution, if it’s just a quick tactical fix then go with plan A, but remember these kinds of thing can hang around far longer than originally planned if they prove to be useful.

  • The size of the data and the performance required. 

A final thought is that the design in the second diagram will also reduce the size of the fact table in SQL Server 2005 and standard edition of SQL Server 2008, but data compression in SQL Server 2008 enterprise edition will remove this advantage.