What’s different about a data warehouse is that all of the information from the line of business or source systems gets rearranged into two types of information:
Dimensions contains a group of descriptive attributes. An example is a product where we have its colour, size, description, product group, product category etc. It should have a single part primary key and then everything else is typically made up of flags and strings.
Facts on the other hand are all numbers. Often these are costs and prices and quantities and all are a measures of activity.
The rearrangement process, dimensional modeling, is a disciplined structured exercise very similar to the normalisation exercise that a data base designer goes through as part of the wider systems analysis piece of a project. However dimensional modeling results in a structure optimised for reporting, where normalisation is focused on performance for transactional throughput .
Surprisingly both techniques are nearly as old as each other and both had a lot to do with the work of Edgar F “Ted” Codd He also coined the term OLAP (On- Line Analytical Processing) so he also helped to get business intelligence off the ground.
So what is about this technique that makes it faster and easier for reporting and at the same time annoys database purists?
For a start there are a lot less tables involved if we go back to the product dimension above, all of the product information in a data warehouse will reside in one table where before it would have been in six or seven related tables in the source system. When the product dimension is built these relationships are flattened, so queries have less joins in them.
Of course there’s a catch and that is that this new structure appears to be a nightmare to update and looks like it wastes a lot of disk space. Yes it is hard to update and special techniques are used to do this, but the data warehouse is kept constantly up to date and disk space is cheap.
A popular quick alternative to the data warehouse is the operational data store (ODS). You mirror or snapshot your line of business system and use the off line copy for reporting. It’s quick and somehow free because you were doing it already. Well not really that schema is still not designed for reporting although I suppose you could run a script to index everything that moves, but then it wouldn’t be so great as your hot standby.