The Microsoft Azure Partner Community is led by National Partner Technology Strategists from the Microsoft US Partner Team. Partner Community activities include blog posts, discussions on Yammer, newsletters, and community calls.
Data platforms and big data on Microsoft Azure
This month’s Azure Partner Community topic—data platforms and big data on Microsoft Azure—represents such a wide offering set that this introductory post will simply highlight a few of the options Microsoft offers through Platform as a Service (PaaS). When we look at adding Infrastructure as a Service (IaaS), the options are limited only by the solution architect’s requirements. Our posts on this topic will explain the options available for data platforms in Azure, and how Microsoft partners can fold them into their current service offerings as well as develop new service offerings.
Azure SQL Database
Microsoft Azure SQL Database provides multiple service tiers with elastic scale and is backed by a financially backed SLA that is up to 99.99% uptime. Your Platform as a Service options using SQL Database are outlined on the Microsoft Azure website. You can also deploy full SQL Server on Infrastructure as a Service. With the release of our new Virtual Machine sizes backed by Premium Storage, a new type of SSD-based storage that can handle workloads of up to 50k IOPS per VM with very low latencies, there are not may SQL workloads that Azure can’t handle.
HDInsight is our 100% Apache Hadoop-based service in the cloud. HDInsight uses versions of the Hortonworks Data Platform (HDP) distribution and the set of Hadoop components within that distribution. Below is a list of a few of the Hadoop technologies in HDInsight. Review the full list of new features and versions supported.
- Ambari: Cluster provisioning, management, and monitoring
- Avro (Microsoft .NET Library for Avro): Data serialization for the Microsoft .NET environment
- Hbase: Non-relational database for very large tables
- HDFS: Hadoop Distributed File System
- Hive: SQLlike querying
- Mahout: Machine learning
- MapReduce and YARN: Distributed processing and resource management
- Oozie: Workflow management
- Pig: Simpler scripting for MapReduce transformations
- Sqoop: Data import and export
- Storm: Real-time processing of fast, large data streams
- Zookeeper: Coordinates processes in distributed systems
Beyond just the inclusion of these tools in a service, access to them is included in our Business Intelligence tools, including Power Query for Excel, A Hive ODBC Driver, SQL Server Analysis Services, and SQL Server Reporting Services.
Azure Machine Learning is one of our Internet of Things services, and provides a fully managed and integrated machine learning solution that is deployable within minutes. In keeping with the Open Source friendly journey Microsoft has embarked on, the open-source machine language programming language R is supported.
The Azure Machine Learning service received a major upgrade recently that included new modules, additional parameter support, and filter-based feature selection modules for Kendal, Pearson, and Spearman correlations to name a few. Review the comprehensive list of updates.
With the ever increasing amount of data and data sources, the need to process that data requires significant resources. Stream Analytics on Azure, one of our Internet of Things services, provides real-time stream processing in the cloud capable of handling millions of events per second across multiple streams of data and perform real-time analytics. The real power of Stream Analytics is the native integration with Event Hubs, detailed below, to allow Stream Analytics to scale.
Another of our Internet of Things services on Azure, Event Hubs, adds cloud-scale telemetry ingestion from websites, apps, and devices. Support for HTTP and AMQP allow many different platforms to work with Event Hubs. In the video below from the Connect () keynote, you can learn about a great example of how Event Hubs and Stream Analytics work together. The demo starts at 11:55.
Like Machine Learning, the Azure Data Factory service is still a preview feature. It allows for the composition and orchestration of data services at scale, and allows multiple data sources to be connected and processed through one pipeline. Data Factory connections that include SQL Server, Azure SQL, Blobs, Tables, and Hadoop connections and processing through Hive, Pig, and C#. It also allows for Hadoop (HDInsight) cluster management, re-tries, for failures, alerting, and more.
A recent update to Data Factory provides integration with Machine Learning. This update allows the ability to run finished ML models inside of data factory pipelines. ML trained scoring models can repeatedly be run against the various sources connected through Data Factory. The short video below provides a good introduction to this service.
This may not be considered a big data tool, but I feel that batch cloud computing has quite a few implications for analytics in the enterprise. Azure Batch provides cloud-scale scheduling and compute management. It allows for a process to be quickly scaled for large processing jobs that can be scaled out across multiple computers. Batch can scale out to 100,000+ cores when needed and customers pay only for what they use.
The above is not an exhaustive list of our data platforms and big data services—two items missing are Document DB, which we’ll cover in an upcoming post, and Power BI. But the above does outline an end-to-end solution set that provides advanced data collection, processing, and analytics to business. I hope you’ll join us for the big data and data platforms discussion on April 16 for the Azure Partner Community, where we’ll explore the opportunities with these services.
Join the next Azure Partner Community call on April 16, for a discussion about data platforms and big data on Azure.
Previous Azure Partner Community topics
Focus on Azure benefits for partners
- Part 1 – Azure benefits overview
- Part 2 – Azure benefits for competency partners
- Part 3 – Signature Cloud Support for Azure
Focus on Top Partner Topics
- Part 1 – Introduction and RemoteApp
- Part 2 – Azure Site Recovery
- Part 3 – Azure API Management
- Community call recording
Focus on Office 365
- Part 1 – Introduction
- Part 2 – Identify Management
- Part 3 – SharePoint on Azure
- Part 4 – Apps on Azure
- Community call recording
Focus on Networking
Focus on Managing Virtual Machines
- Part 1 – Introduction
- Part 2 – Virtual Machine Management and developer tools
- Part 3 – Virtual Machine images and snapshots
Focus on Migration to Azure