This post is authored by Gopi Kumar, Principal Program Manager at Microsoft.
The Data Science Virtual Machine (DSVM), a popular VM image on the Azure marketplace, is a purpose-built cloud-based environment with a host of preconfigured data and AI tools. It enables data scientists and AI developers to iterate on developing high quality predictive models and deep learning architectures and helps them become much more productive when developing their AI applications. DSVM has been offered for over two years now and, during that time, it has seen a wide range of users, from small startups to enterprises with large data science teams who use DSVM as their core cloud development and experimentation environment for building production applications and models.
Deploying AI infrastructure at scale can be quite challenging for large enterprise teams. However, Azure Infrastructure provides several services supporting enterprise IT needs, such as around security, scaling, reliability, availability, performance and collaboration. The Data Science VM can readily leverage these services in Azure to support the deployment of large scale enterprise team -based Data Science and AI environments. We have assembled guidance for an initial list of common enterprise scenarios in a new DSVM documentation section dedicated to enterprise deployment guidance. We will continue to iterate and add more scenarios over time. In this blog post, we will summarize a few scenarios and refer you to the documentation and scripts on the DataScienceVM github repository to help guide you with enterprise deployments.
Scaling Your AI Environment with Data Science VM Pools
For large teams, a shared pool of Data Science VMs can be an effective way to manage their AI infrastructure. The benefit of using a shared pool is better resource utilization and availability, along with more potential for sharing and collaboration across the team. It also allows IT to manage the DSVM resources more effectively and at a lower cost. We provide guidance for creating a pool for batch-oriented workloads and one for interactive workloads. DSVM batch pools can leverage the Azure Batch AI service or Azure Batch. One of the less well known features of DSVM is that these virtual machine images are can be used on Azure Batch AI and Azure Batch services. This allows you to have a similar environment as the one you use while interactively developing your AI model and application, to be used in a batch processing environment where you periodically retrain your models. Batch and Batch AI provides horizontal scaling. A pool of interactive DSVMs can be created using the Azure Virtual machine scale set (VMSS) feature. This allows you to create a farm of DSVM instances behind a single RDP, SSH, Jupyter endpoint. The VMSS takes care of routing users to the appropriate DSVM instance in the pool. It is also common to have a shared disk (like Azure Files) mounted on each node of your pool such that you have access to your data irrespective of the node on which you are working on. The documentation on DSVM pools provides more details on the process. We also provide Azure Resource Manager (ARM) templates and scripts to create a VM Scale set with DSVM nodes on our GitHub repository.
Common Identity Using Active Directory Integration
By default, on Azure VMs including the Data Science VM (DSVM) local user accounts are created while provisioning the VM and the users authenticate to the VM with these credentials. If you have multiple VMs that you need to access, managing credentials can get cumbersome. Common user accounts and management using a standards-based identity provider allows you to use a single set of credentials to access multiple resources on Azure including multiple DSVMs, databases and data lakes. Active Directory (AD) is a popular identity provider and is supported both on Azure as a service as well as on premises. You can use AD or Azure AD to authenticate users on both a standalone DSVM or a cluster of DSVMs on an Azure virtual machine scale set. This is done by joining the DSVM instances to an AD domain. If you already have an Active Directory to manage the identities, you can use it as your common identity provider. In case you do not have an AD, you can run a managed AD on Azure through a service called Azure Active Directory Domain Services (Azure AD DS). Details for these can found on our article on Active Directory integration. With Active Directory integration, you can login (RDP or SSH) to the DSVM or authenticate to applications such as JupyterHub using a common set of credentials.
Can You Keep Your Secrets Safely?
Yes, you can! A common challenge when building enterprise applications is managing credentials that may be needed in your code for authenticating to external data storage services, web services and cloud-based services. Keeping these credentials secure is an important task. They should never appear on your developer workstations or stored with source code or config files. Azure offers a set of nifty services – Managed Service Identity (MSI) and Azure Key Vault – that can be used to secure credentials to external data sources and cloud services. Managed Service Identity lets you give Azure services (including VM instances, VM Scale sets) an automatically managed identity in Azure AD. You can use this identity to authenticate to any service that supports Azure AD authentication without having any credentials in your code. One common pattern to secure credentials is to use MSI in combination with the Azure Key Vault, a managed Azure service to store secrets and cryptographic keys securely. You can access Key Vault using the managed service identity and retrieve the authorized secrets and cryptographic keys from it. While the documentation on MSI and Key Vault is pretty comprehensive, we have provided a cheat sheet for some common AI development scenarios in the enterprise guidance for DSVM documentation, covering the basics of using these services.
These are just a few considerations when deploying DSVM in large enterprise configurations. Some other aspects that you may need to consider are: monitoring, management, role based access control, policy setting and enforcement, anti-malware and disk encryption, to name a few. As mentioned earlier, the DSVM gives you the full control to leverage all these services within the Azure compute infrastructure while architecting your AI solution. The Azure architecture center is also a great resource and it provides detailed end-to-end architecture and patterns for building and managing your cloud based analytics infrastructure.
We would love to hear feedback on the new enterprise guidance documentation articles for DSVM.