Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

One Version of the Truth

One of the most over used phrases in Business Intelligence is “one version of the truth” so I thought it would be good to discuss why it’s important and what IT professionals can do to achieve it.  In an imaginary organisation like adventure works there would be several line of business systems such as a classic sales order processing system to track sales to resellers plus a shiny new web site for internet sales, while in the back office there would by supply chain management, finance and HR systems. 

It’s Monday morning and the heads of department have their weekly meeting to review sales and operations. Going round the table:

  • The reseller sales manager is delighted because one of the resellers has just placed a big order for more stock to cope with their branch expansion program and he is well on target to meet his bonus. 
  • The internet sales manager is worried that although sales are up he is not on target.
  •  The finance director is very worried that receipts are just not coming in and there is a serious cash flow problem.  In fact he has a raft of figures to show that sales are down compared with last year
  • The managing director is confused

So what is going on here?  There is no big consistent picture of how the company is performing.  The sales manager is measuring his performance on the date a sale was made while the finance director only counts a sale when the invoice is paid.  Another issue might be that they all get their information from a different system at a different time and because a lot of systems have overnight batch jobs to post data the view of the data could be very different on last thing Friday night to first thing Monday morning.

This is where a data warehouse can be useful.  It will be designed to collect data form different systems at agreed times to give a  consistent set of snapshots of the business from which all reports can be sourced.  So that would be the one version of the truth sorted out then?

Yes and no.  Yes it is consistent and accurate (a big assumption and a book in it’s own right) , but it can still be distorted by the tools that access the date warehouse.  Any front end tool that connects to the data warehouse will have additional business logic in. For example a report will be based on a query with logic in it and may have calculated fields that are derived from that query . 

If the business is writing reports, than each department could create it’s own version of a report and these could be inaccurate become so by not being kept up to date.  If the IT department writes the reports then they will understand the data and have the tools to test the report is accurate and reflect what the user wants, which is why Reporting services at the moment doesn’t have a good tool for end user reporting.    However there are still problems here:

  • The IT department is always seen as taking too long to deliver so there are often ad hoc solutions in the more IT aware departments
  • It is difficult to track dependencies and calculations in them. Some IT Pros try to resolve this using stored procedures for the source of all reports and keep all the logic in a database.  However these can be difficult to read and often a developer will copy an existing procedure for their report and adapt it. So now the same calculations can appear in multiple procedures which makes maintenance a nightmare and puts the one version of the truth in jeopardy.

So what else can be done to crack this problem?  In my opinion you need a semantic layer between the data warehouse and the reporting tool.  This will contain several vital features for consistent reporting:

  • Joins between tables are described.
  • Things get named properly for every one to understand.
  • Hierarchies can be created to support relationships between attributes. Check my posts from last week for more on this. 
  • Derived measures and attributes which cannot be held in a relational model can be described e.g. closing stock at period end, like for like sales, profit margin% etc.
  • Fine grain security can be applied to limit what any user can see e.g. regional sales managers only see sales for their region and only HR can see measures related to salary and bonuses. 

The Microsoft approach is to stuff all of this into a Unified Dimensional Model (UDM) and that is the subject of my next post.