The Ins and Outs of Azure Data Factory – Orchestration and Management of Diverse Data

Yesterday at TechEd Europe 2014, Microsoft announced the preview of Azure Data Factory. This post will give you the ins and outs of this new service.

What is Azure Data Factory?

Azure Data Factory is a fully managed service that does information production by orchestrating data with processing services as managed data pipelines. A pipeline connects diverse data (like SQL Server on-premises or cloud data like Azure SQL Database, Blobs, Tables, and SQL Server in Azure Virtual Machines) with diverse processing techniques (like Azure HDInsight (Hive and Pig), and custom C# activities).  This will allow the data developer to transform and shape the data (join, aggregate, cleanse, enrich) so that it becomes authoritative and trustworthy to be consumed by BI tools. These pipelines are all managed within a single pane of glass where rich health and lineage is available to diagnose issues or do impact analysis across all data and processing assets. Some unique points about Data Factory are:

  • Ability to process data from diverse locations and data types.  Data Factory can pull data from relational, on-premises sources like SQL Server and join with non-relational, cloud sources like Azure Blobs.
  • Provide a holistic view of the entire IT infrastructure that includes both commercial and open source together. Data Factory can orchestrate Hive and Pig using Hadoop while also bringing in commercial products and services like SQL Server and Azure SQL Database in a single view.

What can it do?

With the ability to manage and orchestrate the collection, movement and transformation of semi-structured and structured data together, Data Factory provides customers with a central place to manage their processing of web log analytics, click stream analysis, social sentiment, sensor data analysis, geo-location analysis, and more. In public preview, Microsoft views Data Factory as a key tool for customers who are looking to have a hybrid story with SQL Server or who currently use Azure HDInsight, Azure SQL Database, Azure Blobs, and Power BI for Office 365. In the future, we’ll bring more data sources and processing capabilities to the Data Factory.

How do I get started?

For Microsoft customers, we are offering Azure Data Factory as a public preview.  To get started, customers will need to have an Azure subscription or a free trial to Azure. With this in hand, you should be able to get Azure Data Factory up and running in minutes. Start by reading this getting started guide.

For more information on Azure Data Factory: