Sharing Data with Azure

Data sharing is challenging

GDPR – Essential considerations from Safeguard individual privacy with the Microsoft Cloud

GDPR and Data Governance: Your guide to Microsoft tools and technologies

What data is to be shared?

Data classification

Azure SQL Database Data Discovery and Classification

Azure SQL Database Data Discovery and Classification: Data Discovery & Classification (currently in preview) provides advanced capabilities built into Azure SQL Database for discovering, classifying, labelling & protecting the sensitive data in your databases. Discovering and classifying your most sensitive data (business, financial, healthcare, personally identifiable data (PII), and so on.) can play a pivotal role in your organizational information protection stature. It can serve as infrastructure for:

  1. Helping meet data privacy standards and regulatory compliance requirements.
  2. Various security scenarios, such as monitoring (auditing) and alerting on anomalous access to sensitive data.
  3. Controlling access to and hardening the security of databases containing highly sensitive data.

Azure Information Protection

Azure Information Protection: Configure policies to classify, label and protect data based on its sensitivity. Classification with Azure Information Protection is fully automatic, driven by users or based on recommendation.

Data Catalog

Data Catalog: Azure Data Catalog is a fully managed cloud service whose users can discover the data sources they need and understand the data sources they find. At the same time, Data Catalog helps organizations get more value from their existing investments.

With Data Catalog, any user (analyst, data scientist, or developer) can discover, understand, and consume data sources. Data Catalog includes a crowdsourcing model of metadata and annotations. It is a single, central place for all of an organization's users to contribute their knowledge and build a community and culture of data.

Data provider and consumer security

Collaborate securely with partners and customers

Engage with users outside your organisation while maintaining control over your sensitive apps and data. It’s easier than ever for customers and partners to connect with your business using a simple sign-up experience that works with multiple identity providers, including Microsoft, LinkedIn, Google and Facebook. You can also use customisation options to modify the web and mobile experience to fit your brand.

Role-based access control (RBAC)

RBAC: Access management for cloud resources is a critical function for any organization that is using the cloud. Role-based access control (RBAC) helps you manage who has access to Azure resources, what they can do with those resources, and what areas they have access to.

RBAC is an authorization system built on Azure Resource Manager that provides fine-grained access management of resources in Azure.

Azure Key Vault

Azure Key Vault: Azure Key Vault helps solve the following problems:

  • Secrets Management - Azure Key Vault can be used to Securely store and tightly control access to tokens, passwords, certificates, API keys, and other secrets.
  • Key Management - Azure Key Vault can also be used as a Key Management solution. Azure Key Vault makes it easy to create and control the encryption keys used to encrypt your data.
  • Certificate Management - Azure Key Vault is also a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with Azure and your internal connected resources.
  • Store secrets backed by Hardware Security Modules - The secrets and keys can be protected either by software or FIPS 140-2 Level 2 validates HSMs.

How is the data shared?

Azure Information Protection

Azure Information Protection: Share data safely with colleagues as well as your customers and partners. Define who can access data and what they can do with it – such as allowing to view and edit files, but not print or forward.

Database Access

This section references generic database objects and methods that are recognised and implemented by a wide variety of Relational Database Management Systems (RDBMS). However the links and sub text are specific to Microsoft SQL Server as an exemplar.

Row-Level Security

Row-Level Security enables customers to control access to rows in a database table based on the characteristics of the user executing a query (for example, group membership or execution context).

Row-Level Security (RLS) simplifies the design and coding of security in your application. RLS helps you implement restrictions on data row access. For example, you can ensure that workers access only those data rows that are pertinent to their department or restrict customers' data access to only the data relevant to their company.

The access restriction logic is located in the database tier rather than away from the data in another application tier. The database system applies the access restrictions every time that data access is attempted from any tier. This makes your security system more reliable and robust by reducing the surface area of your security system.

SQL Database dynamic data masking

SQL Database dynamic data masking: SQL Database dynamic data masking limits sensitive data exposure by masking it to non-privileged users.

Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.

Database replicas

Readable Secondary Replicas (AlwaysOn Availability Groups):

Directing read-only connections to readable secondary replicas provides the following benefits:

  • Offloads your secondary read-only workloads from your primary replica, which conserves its resources for your mission critical workloads. If you have mission critical read-workload or the workload that cannot tolerate latency, you should run it on the primary.
  • Improves your return on investment for the systems that host readable secondary replicas.

In addition, readable secondaries provide robust support for read-only operations, as follows:

  • Temporary statistics on readable secondary database optimize read-only queries. For more information, see Statistics for Read-Only Access Databases, later in this topic.
  • Read-only workloads use row versioning to remove blocking contention on the secondary databases. All queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints are ignored. This eliminates reader/writer contention.

PolyBase

PolyBase: PolyBase enables your SQL Server 2016 instance to process Transact-SQL queries that read data from Hadoop. The same query can also access relational tables in your SQL Server. PolyBase enables the same query to also join the data from Hadoop and SQL Server. In SQL Server, an external table or external data source provides the connection to Hadoop.

With the underlying help of PolyBase, T-SQL queries can also import and export data from Azure Blob Storage. Further, PolyBase enables Azure SQL Data Warehouse to import and export data from Azure Data Lake Store, and from Azure Blob Storage.

PolyBase thus allows data in the external repositories above to be shared with database and data warehouse users without granting explicit permissions to these repositories for these users.

Database Views

A database view is a virtual table whose contents are defined by a query. Like a table, a view consists of a set of named columns and rows of data. Unless indexed, a view does not exist as a stored set of data values in a database. The rows and columns of data come from tables referenced in the query defining the view and are produced dynamically when the view is referenced.

A view acts as a filter on the underlying tables referenced in the view. The query that defines the view can be from one or more tables or from other views in the current or other databases. Distributed queries can also be used to define views that use data from multiple heterogeneous sources. This is useful, for example, if you want to combine similarly structured data from different servers, each of which stores data for a different region of your organization.

Views are generally used to focus, simplify, and customize the perception each user has of the database. Views can be used as security mechanisms by letting users access data through the view, without granting the users permissions to directly access the underlying base tables of the view. Views can be used to provide a backward compatible interface to emulate a table that used to exist but whose schema has changed. Views can also be used when you copy data to and from SQL Server to improve performance and to partition data.

Database extracts

Database extracts are data sets exported by the RDBMS and held in external files. The file types can be many and varied but are typically flat files e.g. Comma Separated Variable files (CSV). CSV files have long been a standard file format for data transfer and universally recognised by common data tools.

API Management

API Management (APIM) helps organizations publish APIs to external, partner, and internal developers to unlock the potential of their data and services. Businesses everywhere are looking to extend their operations as a digital platform, creating new channels, finding new customers and driving deeper engagement with existing ones. API Management provides the core competencies to ensure a successful API program through developer engagement, business insights, analytics, security, and protection. You can use Azure API Management to take any backend and launch a full-fledged API program based on it.

Storage Access

Storage Accounts

Azure Storage Accounts Azure Storage is Microsoft's cloud storage solution for modern data storage scenarios. Azure Storage offers a massively scalable object store for data objects, a file system service for the cloud, a messaging store for reliable messaging, and a NoSQL store.

Access to storage accounts can be controlled via previously describes RBAC.

Delegating Access to Storage Accounts with a Shared Access Signature

Delegating Access with a Shared Access Signature. A shared access signature (SAS) is a URI that grants restricted access rights to Azure Storage resources. You can provide a shared access signature to clients who should not be trusted with your storage account key but to whom you wish to delegate access to certain storage account resources. By distributing a shared access signature URI to these clients, you can grant them access to a resource for a specified period of time, with a specified set of permissions.

Azure Data Lake

Azure Data Lake Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics.

Access to storage accounts can be controlled via previously describes RBAC.

Azure Databricks

Azure Databricks: Accelerate big data analytics and artificial intelligence (AI) solutions with Azure Databricks, a fast, easy and collaborative Apache Spark–based analytics service.

Azure Databricks comes built in with the ability to connect to Azure Data Lake Storage, Cosmos DB, SQL DW, Event Hubs, IoT Hubs, and several other services. We now have the ability to allow customers to store connection strings or secrets in the Azure Key Vault.

Azure Key Vault can help you securely store and manage application secrets reducing the chances of accidental loss of security information by centralizing the storage of secrets.

Azure Databricks Secrets: Sometimes accessing data requires that you authenticate to external data sources through JDBC. Instead of directly entering your credentials into a notebook, use Azure Databricks secrets to store your credentials and reference them in notebooks and jobs.

When using Key Vault with Azure Databricks to create secret scopes, data scientists and developers no longer need to store security information such as SAS tokens or connections strings in their notebooks. Access to a key vault requires proper authentication and authorization before a user can get access. Authentication establishes the identity of the user, while authorization determines the operations that they are allowed to perform.

As a team lead, you might want to create different Secret Scopes for different data source credentials and then provide different subgroups in your team access to those scopes.

Sharing Data for Business Intelligence

Azure Analysis Services

Azure Analysis Services: Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data models in the cloud. Use advanced mashup and modeling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic data model. The data model provides an easier and faster way for users to browse massive amounts of data for ad-hoc data analysis.

Azure Data Warehouse

Azure Data Warehouse: SQL Data Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution. Import big data into SQL Data Warehouse with simple PolyBase T-SQL queries, and then use the power of MPP to run high-performance analytics. As you integrate and analyze, the data warehouse will become the single version of truth your business can count on for insights.

Machine Learning Web Services

Once you deploy an Azure Machine Learning predictive model as a Web service, you can use a REST API to send it data and get predictions from custom or 3rd party applications. You can send the data in real-time or in batch mode. Excel and Power BI, as business productivity tools, are easily configured to consume from Azure machine learning web services

What data has been shared and to whom?

Log Analytics

Log Analytics: Log data collected by Azure Monitor is stored in a Log Analytics workspace, which is based on Azure Data Explorer. It collects telemetry from a variety of sources and uses the query language from Data Explorer to retrieve and analyze data

Audit Logs

Audit Logs: Azure provides a wide array of configurable security auditing and logging options to help you identify gaps in your security policies and mechanisms. This article discusses generating, collecting, and analyzing security logs from services hosted on Azure.

Data Delivery: Trusted Devices

Intune

Intune: As an IT admin, you must ensure that managed devices are providing the resources that your users need to do their work, while protecting that data from risk.

The Devices workload gives you insights into the devices you manage, and lets you perform remote tasks on those devices.

Data Integrity

Tamper-proof logs, backed by Block Chain technologies, provide data provenance through a chain of custody

Block Chain

Block Chain: Blockchain is a transparent and verifiable system that will change the way people think about exchanging value and assets, enforcing contracts, and sharing data. The technology is a shared, secure ledger of transactions distributed among a network of computers, rather than resting with a single provider. Businesses are using blockchain as a common data layer to enable a new class of applications. Now, business processes and data can be shared across multiple organizations, which eliminates waste, reduces the risk of fraud, and creates new revenue streams.

Azure Blockchain Workbench

Azure Blockchain Workbench: Quickly start your blockchain projects with Azure Blockchain Workbench. Simplify development and ease experimentation with prebuilt networks and infrastructure. Accelerate time to value through integrations and extensions to the cloud services and consuming apps you already use, and innovate with confidence on an open, trusted, and globally available platform.