Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Azure Data Services–what are they?

This might be a recurring theme but worth mentioning again –  Azure continues to change and expand at an alarming rate. On the one hand this is simply scale, for example we now have access to bigger compute resources like the G class virtual machines, ever cheaper storage, and higher and higher limits on what can be done.  However it’s the changes to the the range of services available in Azure that is more important and  nowhere is this more obvious than in the world of data..


In the diagram above you can see what I mean (note: Power BI is in here which is not technically Azure, but is a cloud service like Azure).  I could write a series of posts on what these all are but actually there is good content on what each of them does and tutorials on how to use them.  What is harder to find and what I want to discuss is how these fit together to support certain scenarios that simply weren’t possible or economically viable in a pre-cloud world.  So I have grouped these services by function:

Orchestration. The Event Hub and Data Factory services allow us to ingest data into Azure and then forward it to the other services in the diagram.  However they differ in that Event Hub is designed for consuming data streams from Internet of Things or RSS Feeds where Data Factory is a batch mode solution for large data sets.  

Compute.  The raw data we have needs to be transformed and analysed to get value from it.  The most obvious way to do this is to spin up a VM with whichever tools we are most comfortable. I might use SQL Server or R studio where others might be more comfortable with Oracle or one of the many big data solutions in VM Depot  (a collection of 90 odd open source templates for Azure)..


However Azure also has specific services for processing data such as HD Insight for big data, Machine Learning for predictive analytics and Stream Analytics for near real time scenarios.  I could also add SQL Azure to this list but I have listed it under storage as code is usually executed against a database from another application or service.

Storage.  There are also several ways to store data in Azure blob and table storage, SQL Azure and the newer documentDB ( like MongoDB but as a service).  Note while data can be stored in virtual hard disks inside VMs they are all stored in Azure blob storage anyway.

While I hope it’s useful to have a view of Azure services like this it still doesn’t help in understanding how to connect them up to do useful work. So if we imagine these are stations on the London Underground we get to those station by travelling on a particular line (hopefully not going round and round on the circle line!) . So in the next few posts I am going to take you on a journey through these services which I hope will expand your mind to the opportunities of working with data in Azure.