Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Azure Data Services for Predictive Analytics

In my last post I hopefully explained that business intelligence works very well on Azure and with new services like the Azure Data Warehouse and Power BI (Not strictly part of Azure but a Microsoft  online service).  However while BI is good at showing some trends and how a limited number of factors affect that trend it can really be used as part of process to automatically make decision of the back of some new data. That is the realm of predictive analytics and there are a few routes you can use if you want to do this in the cloud..


The first is to simply use Azure Machine Learning (MAML) which does this as a service (which I have covered in earlier posts).  MAML can directly read data in from a number of sources including OData , a Hive query (from Hadoop or HD Insight as it’s called in Azure.

The other approach is to make use of the extensive gallery of VMs in Azure some of which are configured for machine learning . For example  H2O is an open source VM which provides support for R over Hadoop.  H2O..


which means you just select it enter your Azure subscription details and the VM will spin up complete with tutorials..

Having made your prediction what do you do with it? In MAML the resulting trained model can be published and then accessed on a transaction by transaction or in batch from a rest api. In transaction mode you’ll just put the call to MAML in lie in your app and so to a large extent MAML itself is invisible.  Batch mode might be used as part of the load of a data warehouse such as customer segmentation or to help complete partial data and in that case then it might be good to make use of  the Azure Data Factory (ADF).

In this case we might want also want to do pre-processing and cleansing using the right tools for that, like HD Insight and just do the machine learning in MAML.  ADF can orchestrate all of this including provisioning an HD Insight cluster on demand and calling MAML. Not only that it does a great job of  logging what has been processed and any issues all from your browser.

The other scenario where ADF could make sense when doing predictive analytics is to have a process (Factory) to retrain a MAML model based on newer data in a controlled way.  For example if you have a model to accept/reject loan applications the model needs to be audited but may need to change to reflect more recent data, so a controlled update is needed rather than just continually changing it as new data arrives (which could also make use of ADF).

Finally in the diagram above I have also shown Power BI as being in the mix as this is the de facto service online service form Microsoft although it’s not really part of Azure it’s Office 365, and has now been unbundled form SharePoint as on-line service.