Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Does “self service BI” mean more or less work for IT?

Before I answer that question I want to be clear on what exactly self service BI means as you could argue that users have been doing BI for themselves since the spreadsheet was invented. In those early days users would often have to rekey data into whatever analysis tool they were using as the source information was often a printout from a mainframe or mini. With the advent of odbc, users could be given permission to access source systems directly and get at the data they needed. However this lead to a number of problems:

· The end user would need a good knowledge of SQL

· The data returned might well be inaccurate as a knowledge of the structure and contents of the source system would be required.

· Their queries might perform really poorly and possibly affect other user of the system and overall performance.

· The resultant data was a point in time snapshot and often it would be impossible to track where a given set of data had come from and when the query had been run, as the end user had to consciously record that.

So IT had to help out. We had to provide appropriate access to the data, educate and train the users about SQL and the source systems and possibly provide canned views or stored procedures so they could get the data they needed with minimal impact on the source systems.

This was all fine and well if the data required was only in one system, however this was rarely the case, and there were also problems in getting historical data for trend analysis as this might not be in the line of business systems but in separate archives. The answer to this was the the data warehouse and tools to provide a business friendly view of this data so end users didn’t need to understand SQL or how the data warehouse itself was constructed. IT created the data warehouse and maintained that business friendly view (aka the semantic layer), and the users had various tools to report and analyse its contents.

However there was still a problem, the time and effort need to do this meant there was always a lag behind what users wanted and what IT could deliver. For example there are sets of data on the internet like and social media metrics that are external to a business. Not only that the rate of change in modern business is more rapid than ever in response to unforeseen external factors like the economy, natural disasters, and the on-demand existence that consumers expect .

This has seen the rise of in-memory analysis tools that can handle sizable chunks of data on even a modest laptop. These tools have simple to use import procedures and can consume data from web services via Odata, from xml as well as traditional data sources, and because they capture where the data comes from it is a simple matter to refresh the data from source, on demand. Coupled to this they have built in intelligence to mash-up disparate sets of data into sophisticated analytics. So does this mean that IT is no longer involved in the process?

No, that data warehouse still has its place as a central trusted repository of internal information. Strategic scorecards and dashboards will still have their place too. However when exceptions occur or there is an external factor which could have a major impact (positive or negative) on the business then the end user BI tools will provide the analysis needed to make the necessary correction to the decision being made.

So what does IT provide in the self-service BI world?

It may be stating the obvious but any end-user BI tool is only as good as the data that is fed into it, and so IT has a key role to play in maintaining data quality. This is partly about cleansing the data and partly about augmenting it for more meaningful analysis. Cleansing means de-duplication, correcting keying errors while augmentation processes will add in missing data. In both cases IT will be working closely with users to do this, to establish the rules and provide interfaces for the users to enter missing data if automated processes detect this but cannot fix it.

In addition to maintaining the data warehouse I can see the need for IT to offer a data market of Odata services to the end user from which they can self-select sets of data they need to make timely decisions. This would to some extent replace the need to produce piles of reports, but in either case it’s important to track usage of reports to weed out those no longer in use.

If some of the users self-service BI analytics move from being tactical to strategic; from being just for their team/department to being enterprise wide, then IT will pick these up and scale them using server rather than PC based technologies.

As Ralph Kimball noted many years ago in his seminal work the Data Warehouse Lifecycle Toolkit, collaboration between IT and business is a critical success factor in any BI project and this is as true today with our modern tools as it was when he first started in BI.