Data & Analytics Partners: Navigating the Azure data services jungle


By Norm Nicholl, Cloud Solution Architect and Ryan Coffman, Cloud Solution Architect

As Microsoft Azure matures and grows, its data capabilities become more robust and flexible, offering you more options when building your solution. There are many considerations for architects and implementers to evaluate, including balancing data platform flexibility, functionality, performance, cost and availability. Unlike with on-premises implementations, using Azure lets you quickly pivot on data-related decisions as new requirements and needs are identified, without the capital expenditure impact or delay in acquisition. Another key consideration is data gravity and integration of data using these services. The movement of data costs money and introduces business latency, although tools like Azure Data Lake Analytics can help with just-in-time integration.

The topic for the Data Platform and Advanced Analytics Partner community this month is about navigating the Azure data services options you have, and choosing the right one for your solution.

Sign up for the April 18 partner call

Understanding the options and asking the right questions

Starting with the options in Microsoft Azure for storing/using persistent storage, we’ve created a cheat sheet that helps you understand the options for persisting data. We’re looking at these persistent storage options from an Azure native perspective in support of analytics and/or compute. Transient data stores, like continuous or streaming data stores, caching, and others, are not included here. We’ll cover them in a separate blog post.

Azure native, persistent data stores include these products, and we’ve included suggested use cases for each:

DocumentDB – Managed NoSQL document database as a service

  • IoT, mobile, social media, gaming
  • Online retail
  • Content management
  • Complex queries

Azure Data Lake Store – Hyperscale repository for big data analytics workloads

  • IoT, social media, gaming
  • Clickstream analytics
  • Unstructured, semi-structured data

Azure Blob storage – Massively scaled object storage for unstructured data

  • Application backend (images, media, etc.)
  • Logging scenarios
  • Unstructured, semi-structured data
  • OS, backups

Azure Table storage – A  NoSQL key-value store for rapid development using massive semi-structured datasets

  • Simple API query workloads
  • Web application, address book, user data
  • Semi-structured, unstructured data

Azure SQL Data Warehouse – Elastic data warehouse as a service with enterprise-class features

  • Structured relational data
  • Decision support
  • Analytics workloads
  • Ad hoc reporting

Azure SQL Database – A managed cloud database for app developers

  • Structured relational data
  • Decision support
  • Analytical workloads
  • Ad hoc reporting

SQL Server on Virtual Machines – Migrate workloads for SQL Server virtualization

  • Structured relational data
  • Transactional workloads
  • Spatial and rich data types

Across all of these services, Azure Data Catalog can be used to capture and catalog technical and business metadata. It provides capabilities that enable any user – from analysts to data scientists to developers – to discover, understand, and consume data sources. It’s a strong vehicle for sharing data across the broader user ecosystem.

The image below contains more details about each of these services, include access/integration capabilities, key features, and lambda attribute/domain. Understanding each of these can help you make a preliminary decision about which persistent store may be best for your scenario. Azure Native Persistent Data Stores

Considerations for identifying the right persistent store(s)

We compiled this list of questions to assist you with identifying the right persistent store(s) for the requirements:

Cloud feasibility

  • Can you use the cloud?
  • If so, for what data?
  • Are there any security or usage constraints in using the cloud?

Data variety

  • How much data do you need to store (volume)?
  • How much data do you need to store (volume)?
  • Will you use non-relational data (variety)?
    • Document/JSON or late binding/unstructured?

Data velocity

  • Will you have streaming data?
  • Is the source data cloud-born and/or on premise born?
  • How much daily data needs to be imported into the solution?

Consumption

  • Will you use dashboards?
  • What are the service levels for the operational reports (speed and availability)?
  • Will you do predictive analytics (like machine learning)?
  • What are your high availability and/or disaster recovery requirements?
  • How many concurrent users will access the solution at peak time?
  • How many concurrent users will access the solution on average?

Other considerations

  • Do you have a developer toolset you prefer?
    • Do the developers have Hadoop skills?
  • Does this solution require always-on client access?
  • What is the skill level of the end users?
  • What are your current pain points or obstacles (such as performance, scale, storage, concurrency, query times)?
  • What is the budget and timeline?

Azure Data Catalog

Most enterprises have taken on long-term metadata or master data management projects with minimal overall impact to their employees and customers. The ideal way to catalog enterprise data assets is through crowdsourcing – which is only accomplished by making it self-service. Enabling users to share their knowledge about data assets and usage toward some common business process is powerful, moving an organization toward becoming data driven when making business decisions.

Process flow for cataloging data

Azure Data Catalog features

  • Enterprise-wide catalog in Azure that enables self-service discovery of data from any source
  • A metadata repository where users can register, annotate, discover, understand, and consume data sources
  • A platform with open REST APIs that allows developers to integrate data discovery capabilities into their applications and processes

Data Catalog offers a new way to solve a long-standing pain point for most enterprises. It can be an opportunity to engage with new clients or business stakeholders. Given its API framework, it can  easily be added to applications, tooling, or custom solutions that a business unit may require.

The partner opportunity

Microsoft Azure persistent storage services offer forward-looking, architectural, flexible solutions that enable businesses as requirements and technologies evolve rapidly. Spend some time with the links we’ve provided both above and below to become acquainted with the services in Azure. To meet customer needs, architects and implementation experts will need to be willing and ready to mix and match these services.

Join us on the April 18 Data Platform and Advanced Analytics Partner community call for a discussion about this topic.

Sign up for the April 18 partner call

Resources

Data Platform, Intelligence, and Analytics Partner Community

Comments (2)

  1. Penny Garbus says:

    I Will not be able to attend the April 18th session, but would love to attend one after April 24th.

  2. Very useful and helps to clarify the offerings.

    I’d like to use the Persistent Data Stores image in my customer presentations. If there is already a PowerPoint or Visio version of this, it would save me a lot of time creating it.

Skip to main content