Automating HPC Cluster Deployments in Azure IaaS: Part I - Create a HPC Pack cluster using the Microsoft HPC Pack IaaS deployment script

Author: Sunbin Zhu

Date: December 2, 2014

 

 The Microsoft HPC Pack IaaS deployment script provides an easy and fast way to deploy HPC clusters in Azure infrastructure services (IaaS) on either the Azure Global or the Azure China service operated by 21Vianet. It utilizes the HPC Pack VM image in the Azure Marketplace for fast deployment and provides a comprehensive set of configuration parameters to make the deployment easy and flexible. This tutorial shows you how to use the Microsoft HPC Pack IaaS deployment script to set up an HPC Pack cluster in Azure.

Prerequisites

To run the HPC IaaS deployment script, you need the following:

Note:

You need to specify the parameter “-Environment AzureChinaCloud” for the command “Get-AzurePublishSettingsFile” or “Add-AzureAccount” if you are using a subscription in the Azure China service operated by 21Vianet.

 

Besides the two methods described in How to install and configure Azure PowerShell, if you prefer to use your own certificate to communicate with Azure, you can follow these steps to manually configure the Azure subscription on your client computer.

  1. Prepare the certificate pair, one PFX certificate with private key and one CER certificate with public key.

  2. Sign in to the Azure Management Portal, Click Settings, click Management Certificates, and then click Upload to upload the CER certificate. Remember the SUBSCRIPTION ID of the certificate.

  3. Import the PFX certificate to the certificate store “Current User\Personal” on your client computer by the PowerShell command below:

Import-PfxCertificate –FilePath <PfxFilePath> -CertStoreLocation Cert:\CurrentUser\My

       4. Configure the Azure subscription by the PowerShell commands below, where “SubName” can be user defined, and “SubId” is the SUBSCRIPTION ID in step 2.

$cert = Get-Item Cert:\CurrentUser\My\<your cert thumbprint>

Set-AzureSubscription -SubscriptionName "<SubName>" -SubscriptionId "<SubId>" -Certificate $cert

 

Create the configuration file

The HPC Pack IaaS deployment script accepts a configuration file as input which describes the infrastructure of the HPC cluster, including the Azure virtual network, affinity groups, storage account, cloud services, domain controller, remote or local SQL server databases, head node, compute nodes, broker nodes, and Azure PaaS (“burst”) nodes. To write a correct configuration file according to your requirements is the most important step to set up a HPC Pack cluster. After the configuration file is ready, you just need run one simple command, and the script creates everything for you. For more information about how to write the configuration file, see “manual.rtf” in the script folder.

This blog will show you the steps to set up a HPC Pack cluster from a typical sample configuration file as shown below:

<?xml version="1.0" encoding="utf-8" ?>

<IaaSClusterConfig>

  <Subscription>

    <SubscriptionName>BigCompS-Suzhu</SubscriptionName>

    <StorageAccount>hpciaasdemostorage</StorageAccount>

  </Subscription>

  <AffinityGroup>HPCDemoDefaultAG</AffinityGroup>

  <Location>West US</Location>

  <VNet>

    <VNetName>HPCVNet</VNetName>

    <SubnetName>Subnet-1</SubnetName>

  </VNet>

  <Domain>

    <DCOption>NewDC</DCOption>

    <DomainFQDN>hpcdomain.local</DomainFQDN>

    <DomainController>

      <VMName>HPCDemoDC</VMName>

      <ServiceName>HPCIaaSDemoSvc</ServiceName>

      <VMSize>Large</VMSize>

    </DomainController>

  </Domain>

  <Database>

    <DBOption>NewRemoteDB</DBOption>

    <DBVersion>SQLServer2014_Enterprise</DBVersion>

    <DBServer>

      <VMName>HPCDemoDB</VMName>

      <ServiceName>HPCIaaSDemoSvc</ServiceName>

      <VMSize>ExtraLarge</VMSize>

      <DataDiskSizeInGB>500</DataDiskSizeInGB>

    </DBServer>

  </Database>

  <HeadNode>

    <VMName>HPCDemoHN</VMName>

    <ServiceName>HPCIaaSDemoSvc</ServiceName>

    <VMSize>ExtraLarge</VMSize>

    <EnableRESTAPI />

    <EnableWebPortal />

  </HeadNode>

  <ComputeNodes>

    <VMNamePattern>HPCDemoCN%0000%</VMNamePattern>

    <ServiceName>HPCIaaSDemoCNSvc</ServiceName>

    <VMSize>A8</VMSize>

    <NodeCount>2</NodeCount>

    <AffinityGroup>HPCDemoIBAG</AffinityGroup >

  </ComputeNodes>

  <BrokerNodes>

    <VMNamePattern>HPCDemoBN%1000%</VMNamePattern>

    <ServiceName>HPCIaaSDemoBNSvc</ServiceName>

    <VMSize>Medium</VMSize>

    <NodeCount>2</NodeCount>

  </BrokerNodes>

  <AzureBurst>

    <Certificate>

      <Id>1</Id>

      <PfxFile>E:\hpciaasdemo1.pfx</PfxFile>

    </Certificate>

    <Certificate>

      <Id>2</Id>

      <PfxFile>E:\hpciaasdemo2.pfx</PfxFile>

    </Certificate>

    <AzureNodeTemplate>

      <TemplateName>AzureNodeTemplate-1</TemplateName>

      <SubscriptionId>bb9252ba-831f-4c9d-ae14-9a38e6da8ee4</SubscriptionId>

      <CertificateId>1</CertificateId>

      <ServiceName>HPCIaaSDemoBurstSvc1</ServiceName>

      <StorageAccount>hpciaasdemoburststg1</StorageAccount>

      <NodeCount>3</NodeCount>

      <RoleSize>Medium</RoleSize>

    </AzureNodeTemplate>

    <AzureNodeTemplate>

      <TemplateName>AzureNodeTemplate-2</TemplateName>

      <SubscriptionId>bb9252ba-831f-4c9d-ae14-9a38e6da8ee4</SubscriptionId>

      <CertificateId>1</CertificateId>

      <ServiceName>HPCIaaSDemoBurstSvc2</ServiceName>

      <StorageAccount>hpciaasdemoburststg2</StorageAccount>

    </AzureNodeTemplate>

  <AzureNodeTemplate>

      <TemplateName>AzureNodeTemplate-3</TemplateName>

      <SubscriptionId>ad4b9f9f-05f2-4c74-a83f-f2eb73000e0b</SubscriptionId>

      <CertificateId>1</CertificateId>

      <ServiceName>HPCIaaSDemoBurstSvc3</ServiceName>

      <StorageAccount>hpciaasdemoburststg3</StorageAccount>

      <Proxy>

        <UsesStaticProxyCount>false</UsesStaticProxyCount>

        <ProxyRatio>100</ProxyRatio>

        <ProxyRatioBase>400</ProxyRatioBase>

      </Proxy>

      <OSVersion>WindowsServer2012</OSVersion>

    </AzureNodeTemplate>

  </AzureBurst>

</IaaSClusterConfig>

Here are brief descriptions of the elements in the configuration file:

IaaSClusterConfig is the root element of the configuration file.

Subscription specifies the Azure subscription used to deploy the HPC Pack cluster. Use the command below to make sure the Azure subscription name is configured and unique in your client computer. In this sample, we use the Azure subscription “BigCompS-Suzhu”.

Get-AzureSubscription –SubscriptionName <SubscriptionName>

All the persistent data for the HPC Pack cluster will be stored to the storage account “hpciaasdemostorage”. If the storage account doesn’t exist yet, the script will create it in the region specified in Location.

AffnityGroup specifies the default affinity group name; all the virtual machines will be created in the default affinity group if no other affinity group is explicitly specified for them. In this sample, all virtual machines except for the compute nodes will be created in affinity group “HPCDemoDefaultAG”.

Location specifies the region in which you want to deploy the HPC Pack cluster, “West US” in this sample.

VNet specifies the settings of the virtual network and the subnet. All the virtual machines of the HPC cluster will be created in the virtual network and subnet. You can create the virtual network and subnet yourself before running this script, or the script creates a virtual network with address space 192.168.0.0/20, and subnet with address space 192.168.0.0/23. In this sample, we will let the script create the virtual network “HPCVNet” and subnet “Subnet-1”.

Domain specifies the Active Directory domain settings for the HPC Pack cluster. All the virtual machines created by the script will join the domain. Currently, the script supports three domain options: ExistingDC, NewDC and HeadNodeAsDC. In this sample, we will create a new domain controller “HPCDemoDC” in a new cloud service “HPCIaaSDemoSvc”. The domain FQDN is “hpcdomain.local”.

Database specifies the database settings for the HPC Pack cluster. Currently, the script supports three database options: ExistingDB, NewRemoteDB and LocalDB. In this sample, we will create a remote database server “HPCDemoDB” with a 500GB data disk. The virtual machine shares the cloud service “HPCIaaSDemoSvc” with domain controller “HPCDemoDC”. And the settings of initial size and auto growth for each HPC databases can be found in the file “HPCIaaSCreateDatabase.sql” in the script folder.

HeadNode specifies the settings of the HPC Pack head node. In this sample, we will create a head node “HPCDemoHN” with HPC job scheduler REST API and HPC web portal enabled. The virtual machine also shares cloud service “HPCIaaSDemoSvc” with domain controller “HPCDemoDC” and database server “HPCDemoDB”.

ComputeNodes specifies the settings of the HPC Pack compute nodes. In this sample, we will create two A8 size compute nodes, “HPCDemoCN0000” and “HPCDemoCN0001”, in a new cloud service “HPCIaaSDemoCNSvc”. Because A8 is equipped with InfiniBand networking and cannot be provisioned in same affinity group with other sizes which are not equipped with InfiniBand networking, we will create a separate affinity group “HPCDemoIBAG” for the compute nodes.

BrokerNodes specifies the settings of the HPC Pack broker nodes. In this sample, we will create two Medium size broker nodes, “HPCDemoBN1000” and “HPCDemoBN1001”, in a new cloud service “HPCIaaSDemoBNSvc”.

AzureBurst specifies the settings of Azure PaaS (“burst”) nodes. In this element, Certificate specifies the management certificate used to communicate between the head node and Azure. The corresponding CER format certificate must already be uploaded to Azure. You can specify one or more certificates. AzureNodeTemplate specifies the configuration of the Azure node template to be created on the head node; you can specify one or more Azure node templates. In this sample, we will upload two PFX certificates from client computer to the head node, and import them on the head node. And we will create three Azure node templates: “AzureNodeTemplate-1”, “AzureNodeTemplate-2” and “AzureNodeTemplate-3“. The templates “AzureNodeTemplate-1” and “AzureNodeTemplate-2” are associated with Azure subscription “bb9252ba-831f-4c9d-ae14-9a38e6da8ee4”, they share certificate 1 to communicate with Azure; and “AzureNodeTemplate-3” is associated with another Azure subscription “ad4b9f9f-05f2-4c74-a83f-f2eb73000e0b”, it uses certificate 2 to communicate with Azure. (These subscription Ids are fictitious examples.)

Note:

1. The Azure subscriptions specified in AzureBurst can be different from the Azure subscription in which you are deploying the HPC Pack cluster. In this sample, we are deploying the cluster in subscription “BigCompS-Suzhu”, while bursting to two different Azure subscriptions.

2. All the Azure resources specified in AzureBurst (including cloud services, storage account, etc.) must be pre-created. Currently, the script doesn’t validate their existence, and won’t automatically create the resources either.

3. The CER certificates for the PFX certificates must already be uploaded to the Azure Management Portal under the corresponding Azure subscriptions.

Run the HPC Pack IaaS Deployment Script

  1. Open the PowerShell console on the client computer as an administrator.

  2. Enter into the script folder (E:\IaaSClusterScript in this example)

cd E:\IaaSClusterScript

       3. (Optional) If you want to run some pre-configuration and/or post-configuration custom actions on the head node, you can place the custom script file HPCHNPreConfig.ps1 and/or HPCHNPostConfig.ps1 in the script folder before deploying the cluster with the script. In this example, we placed the following two custom script files:

 

        HPCHNPreConfig.ps1 just writes one line in a file.

# HPCHNPreConfig.ps1

"$(Get-Date): Running HPCHNPreConfig.ps1" | Out-File c:\HPCHNPreConfig.txt

HPCHNPostConfig.ps1 will assign a node template to the two compute nodes and bring them online.

# HPCHNPostConfig.ps1

Add-PSSnapin Microsoft.HPC

while($true)

{

    Get-HpcNode -HealthState Unapproved -Name HPCDemoCN* | Assign-HpcNodeTemplate -Name "Default ComputeNode Template" -Confirm:$false

    Get-HpcNode -State Offline -Name HPCDemoCN* | Set-HpcNodestate -State online -Confirm:$false

    $onlineCNs = @(Get-HpcNode -State online -Name HPCDemoCN*)

    if($onlineCNs.Count -eq 2)

    {

        break

    }

    else

    {

        Start-Sleep -Seconds 120

    }

}

      4. Run the command below to deploy the HPC Pack cluster.

.\New-HpcIaaSCluster.ps1 –ConfigFile E:\HPCDemoConfig.xml –AdminUserName johnlee

 

The script generates a log file automatically since we didn’t specify the “-LogFile” parameter. The logs are not written into the log file in real time, but collected at the end of the validation and the deployment, so if the PowerShell process is killed when the script is running, some logs will be lost.

Because the parameter “AdminPassword” was not specified in the above command, we will be prompted to input the password for user “johnlee”.

The certificates “hpciaasdemo1.pfx” and “hpciaasdemo2.pfx” specified in AzureBurst are both password protected, so we have to input the passwords for them as well.

 

The script then starts to validate the configuration file. It takes from tens of seconds to several minutes depending on the network connection.

After validations are passed, the script will list all the resources which will be created for the HPC cluster, and we input “Y” to continue.

The script then starts to deploy the HPC Pack cluster. First, the affinity groups, the storage account and the virtual network and subnet will be created.

After the affinity groups are created, the cloud services are created in the corresponding affinity groups.

Then the script creates the virtual machine “HPCDemoDC” and promotes it as a domain controller.

After the domain controller is ready, the script starts to create virtual machines for the head node, database server, the first broker node and the first computer node.

After the virtual machines for the head node and the remote database server are created, the script configures the remote database server. The time for this step depends on the initial size settings for the HPC databases in the file “HPCIaaSCreateDatabase.sql”; the larger the size specified, the more time taken for this step. The script won’t start the “Preparing head node” step until this step is done.

After the first compute node is successfully deployed, the script will sysprep it and capture a base compute node image, and then use the base image to deploy all the compute nodes. The same process is used for broker nodes deployment.

After the remote database server is configured, the script starts to prepare the head node. In this step, all the HPC services (except for the HPC web service) will be started. Before starting the HPC services, the script will upload the custom script HPCHNPreConfig.ps1 to the head node and run it if it exists in the script folder.

After “Preparing head node” is done, the script starts to configure the head node. In this step, the script will complete the “Deployment To-do List” for the HPC Pack cluster, including setting the network topology, creating a default compute node template, etc. Because we specified “EnableRESTAPI” and “EnableWebPortal” for the head node, the script will enable the HPC Job scheduler REST API and start HPC web service in this step, a self-signed certificate will be generated for this purpose on the head node, and the corresponding CER certificate will be downloaded to the client computer. The full path of the CER certificate will be indicated at the end of the deployment.

The last step is to create Azure PaaS (“burst”) nodes on the head node. The script will upload the custom script HPCHNPostConfig.ps1 to the head node and start to run it at the end of this step.

 Note: If AzureBurst is not specified in the configuration, the custom script HPCHNPostConfig.ps1 will be started in the “Configuring head node” step.

The deployment takes about 55 minutes. The CER certificate downloaded from the head node at “Configuring head node” is placed in the “MyDocuments” folder of the current user.

The compute node image and broker node image captured during this deployment can be used to add more nodes later, see IaaS node management.

We can use remote desktop to log in to the head node and open HPC Cluster Manager to check the status of the HPC Pack cluster.

We created four node templates, “Default ComputeNode Template”, “AzureNodeTemplate-1”, “AzureNodeTemplate-2” and “AzureNodeTemplate-3”. And we created two compute node and two broker nodes. The compute nodes have been assigned to “Default ComputeNode Template” and brought online by the custom script HPCHNPostConfig.ps1, while the broker nodes are not assigned to any node template yet. And we created three Azure PaaS (“burst”) nodes for “AzureNodeTemplate-1”; they are all in Not-Deployed state, we can manually start them later.

 

Since we enabled the HPC web portal, we can log in to the HPC web portal https://hpciaasdemosvc.cloudapp.net/hpcportal/ with the domain user “HPCDOMAIN\johnlee” to submit and manage jobs. Before logging in to the web portal, we shall import the CER certificate mentioned above first, or there will be a security warning. We can use the following PowerShell command to import the certificate:

Import-Certificate -FilePath C:\Users\suzhu\Documents\HPCWebComponent_HPCDemoHN_20141104142545.cer -CertStoreLocation Cert:\LocalMachine\Root

 

We will submit a new simple job by clicking “New Job” -> “HelloWorld” and input the parameters as below, then submit with the credentials of the domain user “HPCDOMAIN\johnlee”.

We can see the job is running.

Because we enabled the HPC Job Scheduler REST API on the head node, and we already imported the CER certificate, we can use the HPC Pack Client Utilities to manage the jobs on the client computer.

 Finally we shall remove the two custom script files (HPCHNPreConfig.ps1 and HPCHNPostConfig.ps1) from the script folder to prevent from running them for future deployments.