Relational Data Warehouse + Big Data Analytics: Analytics Platform System (APS) Appliance Update 3

This blog post was authored by: Matt Usher, Senior PM on the Microsoft Analytics Platform System (APS) team

Microsoft is happy to announce the release of the Analytics Platform System (APS) Appliance Update (AU) 3. APS is Microsoft’s big data in a box appliance for serving the needs of relational data warehouses at massive scale. With this release, the APS appliance supports new scenarios for utilizing Power BI modeling, visualization, and collaboration tools over on premise data sets. In addition, this release extends the PolyBase to allow customers to utilize the HDFS infrastructure in Hadoop for ORC files and directory modeling to more easily integrate non-relational data into their data insights.

The AU3 release includes:

  • PolyBase recursive Directory Traversal and ORC file format support
  • Integrated Data Management Gateway enables query from Power BI to on premise APS
  • TSQL compatibility improvements to reduce migration friction from SQL Server SMP
  • Replatformed to Windows Server 2012 R2 and SQL Server 2014

PolyBase Directory Traversal and ORC File Support

PolyBase is an integrated technology that allows customers to utilize the skillset that they have developed in TSQL for querying and managing data in Hadoop platforms. With the AU3 release, the APS team has augmented this technology with the ability to define an external table that targets a directory structure as a whole. This new ability unlocks a whole new set of scenarios for customers to utilize their existing investments in Hadoop as well as APS to provide greater insight into all of the data collected within their data systems. In addition, AU3 introduces full support for the Optimized Row Column (ORC) file format – a common storage mechanism for files within Hadoop.

As an example of this new capability, let’s examine a customer that is using APS to host inventory and Point of Sale (POS) data in an APS appliance while storing the web logs from their ecommerce site in a Hadoop path structure. With AU3, the customer can simply maintain a structure for their logs in Hadoop in a structure that is easy to construct such as year/month/date/server/log for simple storage and recovery within Hadoop that can then be exposed as a single table to analysts and data scientists for insights.

In this example, let’s assume that each of the Serverxx folders contains the log file for that server on that particular day. In order to surface the entire structure, we can construct an external table using the following definition:

CREATE EXTERNAL TABLE [dbo].[WebLogs]
(
	[Date] DATETIME NULL,
	[Uri] NVARCHAR(256) NULL,
	[Server] NVARCHAR(256) NULL,
	[Referrer] NVARCHAR(256) NULL
)
WITH
(
	LOCATION='//Logs/',
	DATA_SOURCE = Azure_DS,
	FILE_FORMAT = LogFileFormat,
	REJECT_TYPE = VALUE,
	REJECT_VALUE = 100
);

By setting the LOCATION targeted at the //Logs/ folder, the external table will pull data from all folders and files within the directory structure. In this case, a simple select of the data will return data from only the last 10 entries regardless of the log file that contains the data:

SELECT TOP 5
	*
FROM
	[dbo].[WebLogs]
ORDER BY
	[Date]

The results are:

Note: PolyBase, like Hadoop, will not return results from hidden folders or any file that begins with an underscore (_) or period(.).

Integrated Data Management Gateway

With the integration of the Microsoft Data Management Gateway into APS, customers now have a scale-out compute gateway for Azure cloud services to more effectively query sophisticated sets of on-premises data.  Power BI users can leverage PolyBase in APS to perform more complicated mash-ups of results from on-premises unstructured data sets in Hadoop distributions. By exposing the data from the APS Appliance as an OData feed, Power BI is able to easily and quickly consume the data for display to end users.

For more details, please look for an upcoming blog post on the Integrated Data Management Gateway.

TSQL Compatibility improvements

The AU3 release incorporates a set of TSQL improvements targeted at richer language support to improve the types of queries and procedures that can be written for APS. For AU3, the primary focus was on implementing full error handling within TSQL to allow customers to port existing applications to APS with minimal code change and to introduce full error handling to existing APS customers. Released in AU3 are the following keywords and constructs for handling errors:

In addition to the error handling components, the AU3 release also includes support for the XACT_STATE scalar function that is used to indicate the current running transaction state of a user request.

Replatformed to Windows Server 2012 R2 and SQL Server 2014

The AU3 release also marks the upgrade of the core fabric of the APS appliance to Windows Server 2012 R2 and SQL Server 2014. With the upgrade to the latest versions of Microsoft’s flagship server operating system and core relational database engine, the APS appliance takes advantage of the improved networking, storage and query execution components of these products. For example, the APS appliance now utilizes a virtualized Active Directory infrastructure which helps to reduce cost and increase domain reliability within the appliance helping to make APS the price/performance leader in the big data appliance space.

APS on the Web

To learn more about the Microsoft Analytics Platform System, please visit us on the web at http://www.microsoft.com/aps