Running parallel Excel on Windows HPC 2008 R2 hosted in Windows Azure IaaS VMs: Part 2

Article
06/25/2013

click here to Move to next blog-Part3

click here to Move to previous blog-Part1

1 Introduction

In first part of my blog blog Part1 I have discussed the step-by-step procedure to create a Windows HPC cluster in Azure IaaS VMs; here I assume that you already have an HPC cluster up and running in Azure it can be used to run a basic Excel workbook on multiple cluster nodes. Execution of Excel on Windows HPC is based on HPC Services for Excel which uses a Service-Oriented Architecture (SOA) infrastructure to run Excel jobs on a cluster (For more information, see the documentation “How HPC Services for Excel Work” : https://technet.microsoft.com/en-us/library/ff877825%28v=ws.10%29.aspx ). HPC Services for Excel includes “ready-made” SOA clients and SOA services that enable developers to quickly modify workbooks to run on a cluster. HPC Services for Excel includes a client/service framework specifically designed to be used from Excel on the desktop. There are two important parts (see Figure 1) to this framework:

a client library, which is installed on the user’s desktop;
a service library, which is installed on the cluster compute nodes.

Figure 1: architecture of HPC Services for Excel

Microsoft.Hpc.Excel.ExcelDriver: provides a wrapper around the Excel Primary Interop Assembly (PIA) interface that enables managed code to interact with a Microsoft Office application's COM-based Microsoft.Hpc.Excel.ExcelDriver, which allows a user to open a workbook, launch an Excel process, and invoke a macro. Microsoft.Hpc.Excel.ExcelDriver exposes a COM interface and a .NET API.

HPC Services for Excel supports two mechanism:

offloading workbooks to the cluster. In offloading workbook, each Excel 2010 instance runs an independent calculation or iteration of the same workbook with a different dataset.
offloading UDFs to the cluster; In offloading UDFs only the UDFs runs on HPC compute nodes.

In this blog I show how to create a very basic example of Excel offloading workbook.

2 HPC macro in Excel

The most important concept to understand in developing workbooks for HPC Services for Excel, and using the macro framework, is asynchronous calculation. A typical Excel application runs synchronously. That happens for example when you press a button to run a macro, and the macro executes some VBA (Visual Basic for Applications) code; you expect that the macros will run from beginning to end, and then complete.

The asynchronous programming model is a little different. In this model, you write VBA functions, but you do not call them directly. Instead, a client library – code that’s installed with HPC Services for Excel – will call your VBA functions.

To support HPC Services for Excel has been create the macro framework that consists of a set of seven VBA macros that you can include in your workbook:

HPC_GetVersion	Return the version of the macro framework used to run calculations on an HPC cluster. It tells the HPC libraries which version of the macro framework the workbook contains. The macro framework described in this document is version 1.0, so the HPC_GetVersion macro must return the string “1.0”. The HPC_GetVersion macro is important because it will allow the workbook to run with any future version of Windows HPC Server. Even if the default macro structure changes in future releases, as long as the HPC_GetVersion macro returns “1.0”, the HPC libraries will be able to execute this workbook on a Windows HPC Cluster.
HPC_Initialize	Perform any pre-calculation or initialization steps. When you run a calculation for an Excel workbook on an HPC cluster, the client library first calls the HPC_GetVersion macro, and then it calls the HPC_Initialize macro. Implement this macro to perform the initialization tasks that are necessary for the entire calculation, such as resetting counter variables, clearing old results from the spreadsheet, turning off the display of updates to the workbook during the calculation, and opening external resources such as log files or database connections. (HPC_Initialize is executed on the client’s desktop)
HPC_Partition	Collect required parameters for a single calculation step This macro represents the partition stage of that framework. The framework implements an iterative calculation model for the workbook, where different calculation steps are performed on different nodes in the HPC cluster. The partition stage of the framework establishes how the overall calculation is split into individual calculation steps. When the last calculation step is complete, this macro should return Null. (HPC_Partition is executed on the client’s desktop)
HPC_Execute	Perform one calculation step, as part of a larger overall calculation (HPC_Execute is executed on HPC nodes)
HPC_Merge	Process the results of a single calculation step (HPC_Merge is executed on the client’s desktop)
HPC_Finalize	Perform any post-calculation processing (HPC_Finalize is executed on the client’s desktop)
HPC_ExecutionError	Handle errors arising from the calculation HPC_ExecutionError, is only used when the calculation encounters some error. If so, this macro is called and you can add some error handling code – for example, you might show a pop-up dialog to the user warning them of the error.

Each macro has a specific purpose in the macro framework, and each will be called by the HPC Services for Excel client library during a cluster calculation at a specific point.
A typical calculation runs in the cluster in a workflow:

Figure 2: macro framework workflow

Workflow of calculation happens in following order:

When a cluster calculation runs, first the client library calls the HPC_Initialize macro, used to handle any required initialization; for example, you might want to clear out old results from your spreadsheet.
Next, the calculation runs through three macros: HPC_Partition, HPC_Execute, and HPC_Merge. In the diagram above, these are shown as a
loop. That is not really what happens during a calculation, but logically you can think of this as a loop. First of all the client library calls HPC_Partition designed to collect any data required for a single calculation step. For example, if you want to calculate a set of rows one by one, HPC_Partition might return the row number for a single step: first row 1, then row 2, and so on.
Next, the HPC_Execute macro is called. This macro runs the actual calculation. When the calculation runs, the workbook will be copied to the cluster compute nodes and the HPC_Execute macro will be run on one of the compute nodes. HPC_Execute will return the result of the calculation: for example, it might return the last cell in the row which is the end result of a longer calculation.
After HPC_Execute, the client library will call HPC_Merge. Whatever HPC_Execute has calculated will be sent to HPC_Merge.
The HPC_Merge macro is designed to take those results and return them to the spreadsheet running on your desktop. In the HPC_Merge macro, you might insert the calculation results back into your spreadsheet, or you might write them to a log file.

These three macros – HPC_Partition, HPC_Execute, and HPC_Merge – will be called multiple times, once for each step in your calculation, until the entire calculation is complete. These macros implement the iterative calculation model. The macros might be called many times during a single workbook calculation, but each time they are called represents a single calculation step, or iteration.

After the last calculation step has completed, the client library will call HPC_Finalize. You can use this macro to perform any post-calculation processing: for example, you might want to find the average of all the previous calculation steps by using another VBA macro.

In the diagram above, the HPC_Execute macro is highlighted in red to indicate that actually this macro runs on the cluster compute nodes. All the other macros run on the desktop, in Excel on user’s workstation.

It’s important to understand this, because it has some implications for how workbooks use and manage data during the calculation. The three main calculation macros – HPC_Partition, HPC_Execute, and HPC_Merge
– pass data back and forth during the calculation process. In a typical VBA application, there are lots of ways to share data between different macros.
For example, you can use global variables in your VBA code, or write values into spreadsheet cells. But when we’re running the calculation on an HPC cluster, we need to send data between the macros running on the desktop, and the HPC_Execute macro running on the compute nodes.

For that reason, any data we need to run the calculation must be sent from the HPC_Partition macro to the HPC_Execute macro directly. We do this by using the return value from the HPC_Partition macro. That return value becomes the argument (or parameter) to the HPC_Execute function. So data will pass directly between these two macros, and you can pass any value or set of values (as an Array) between these macros. But again, you can’t use global variables or the spreadsheet cells to pass information between the two macros because when it runs on the cluster, there are actually multiple copies of the workbook running at the same time – one on the desktop, and one (or more) on the cluster compute nodes.

In the same way, when the calculation in HPC_Execute is complete, it will return a result as the return value of the macro function. That result becomes the argument (or parameter) to the next macro, HPC_Merge. Once again, any results you want to return –the results of the calculation – must be sent in this way, as the return value of the function.

To simplify the process, two skeleton” macro files are included in this article that you can import into your Excel workbook (see 3.5 paragraph).

3 How to build an Excel workbook to run on HPC cluster

In this section we build an simple workbook to run on HPC cluster. I assume that you have a Windows HPC cluster installed in Azure IaaS VMs (see Part1 of my post), so the following environment is ready to go:

A Windows HPC 2008 R2 cluster (“Enterprise edition”) installed and configured.
Excel 2010 installed on the cluster compute nodes.
A network share folder (i.e. on the HPC head node).
Excel 2010 and the Windows HPC Pack 2008 R2 client utilities installed on user’s desktop (in my setup this client runs in one Windows Azure IaaS VM attached to the same VNET of HPC cluster)

Figure 3: HPC cluster topology in Azure IaaS VMs

3.1 Enable Developer tab on Excel workbook

On the user’s desktop (in our topology exhpclient) open Excel and enable Develop tab

Click the File tab on the Excel ribbon
Click Options.
In the Excel options dialog, select Customize Ribbon on the left.
Check the box next to Developer and click OK to close the dialog.

Figure 4: enable Developer tab in Excel

3.2 Create a new workbook (with extension .xlsb)

Because the new Excel workbook have to include macros, it need to save it as binary workbook (extension file .XLSB) -Figure 5

Figure 5: create a new Excel binary workbook

3.3 Enable Developer tab on Excel workbook

Make following actions on "Sheet1" (Figure 6):

A1 cell: insert the text “Multiplication Factor”
B1 cell: insert the text “Percentage completion”
E1:O11 cells: insert a sequence of integer [1,11]
D2:D27 cells : insert a sequence of integer [1,26]
E2:O27 cells: change background color in Yellow

Figure 6: sequence of integers in a row and column

Set the conditional formatting for the cells E2:O27 (Figure 7)

Figure 7: set the conditional formatting in yellow cells

3.4 Naming of cells

Naming in Excel is useful because leaves possibility to reference cells by name in the VBA code.

Figure 8: open the Name Manager

Figure 9: initially Name Manager is empty

In our Excel worksheet, we define file cell names: Factor, percentageCompletion, MyCols, MyRows, Egress_Table ( Figure 10 - Figure14 )

Figure 10:Egress_Table cells name

Figure 11: Factor cell name

Figure 12:MyCols cells name

Figure 13: MyRows cells name

Figure 14: percentageCompletion cell name

After defining names, Name Manager appears as in Figure 15

Figure 15: List of naming assigned to cells

In Figure 16 is reported the positioning of cells in Sheet1 with related names.

Figure 16: positioning of cells names in Excel sheet1

3.5 Import the skeleton files

To get started, download two basic “skeleton” macro files:

HPCExcelMacros.bas : contains all macro framework functions (the functions described in the section above) designated to support parallel execution. These macros are used by the HPC Services for Excel client library during the calculation.
HPCControlMacros.bas : a set of VBA macros contains the “control” functions

After downloading of macro files, you need to import these macros modules:

1. On the Developer tab of the ribbon, click Visual Basic to open the VBA editor.

2. In the VBA editor, right-click VBA Project at the top of the tree view and then click Import File.

Figure 17: import VBA files

3. In the dialog box, select the file “HPC Excel Macros.bas” and click OK. That adds the first macro file to your project, and it will be available in the Modules folder of the VBA project window.

4. Repeat steps 1-3 to import “HPC Control Macros.bas” and then click OK. That adds the second macro file to your project.

Figure 18: select VBA to be imported

5. Verify that these two macro files appear in the VBA project window:

Figure 19: check presence HPCControlMacros and HPCExcelMacros in the worksheet

Details on HPCControlMacros and HPCExcelMacros are discussed later.

3.6 Adding a button that runs the workbook locally

To add a button that runs the workbook locally

1. With your Excel workbook open, click the Developer tab on the ribbon.

2. On the Developer tab, click Insert and then select the button control – the first control in the list.

Figure 20: insert form button in Sheet1

After you’ve clicked the button, draw a rectangle somewhere on the spreadsheet to insert the button in that location. After you position the button, the Assign Macro dialog box appears.

Figure 21: association of CalculateWorkbookOnDesktop to the Button1

3.7 Adding a button that runs the workbook on the cluster

Repeating the steps done before, you can add a second button (Button2) used to run the Excel Workbook on the cluster. After you position the button, the Assign Macro dialog box appears (Figure 22)

Figure 22: association of CalculateWorkbookOnCluster to the Button2

Give to the buttons a better shape,

Figure 23: specify font for the Buttom1

Change the name of Buttom1, form “Button1” to “Run Local” (Figure 24)

Figure 24: change the name of the Button1 to Run Local

3.8 Add reference to HPC library

To run Excel on the HPC cluster it is required that you add a reference to the HPC library in the VBA editor– that’s the client library.

1. In the VBA editor window, click Tools, and then click References to open the project references dialog box.

Figure 25: Tools -> References

2. In the list of available references, scroll down until you find Microsoft_Hpc_Excel and select the check box next to it. When you’ve checked the box the dialog should look like this:

Figure 26: flag reference to Microsf_HPC_Excel

3. Click OK to close the project references dialog.

3.9 Set cluster scheduler name and shared folder in HPCControlMacros

In the HPCControlMacros module we set a few values to tell Excel how to contact the cluster: the cluster scheduler and the shared folder:

For the cluster scheduler, use the name of your cluster head node – this is the computer name that you would use on the network. You can use a fully-qualified name (for example, headnode.mynetwork.com), but that’s not necessary if you’re in the same domain.
For the shared folder, use the full path to the share folder. The desktop user must have write access to this share folder; and the cluster compute nodes must have read access to the folder.

' Private Const HPC_ClusterScheduler = "HEADNODE_NAME"

Private Const HPC_ClusterScheduler = "exhn001"

' This is a network share used to store a temporary copy

' of the workbook. Make sure that the directory exists,

' that you have write access to the directory, and that

' the compute nodes in the cluster have read access.

' Private Const HPC_NetworkShare = \\PATH\TO\SHARE\DIRECTORY

Private Const HPC_NetworkShare = \\exhn001\share

3.10 Defining the calculations in the macro framework

The HPCControlMacros module contains the macro framework for the calculation. In Figure 27 is reported the workflow with pseudo-code.

Figure 27: flowchart with pseudo-code

Below the meaning of some variables used in Figure 27:

NumRows is the total number of rows in Egress_Table

NumCols is the total number of column in Egress_Table

IndexRow is counter (Integer) to identify the number of row in the Egress_Table

indexCol is a counter (Integer) to identify the number of column in the Egress_Table

From flowchart analysis is evident that in HPC_Partition slit up of data happens by columns; in next example (Part3) use a data partition data by rows.

In Figure 28 is shown a more detailed view of workflow with VBA statements

Figure 28: flowchart with VB statements

All the macros in the HPC framework use the Variant data type for inputs and outputs. The Variant type can be any primitive type (integer, long, double, string) or an Array of these types. We need to include both the row and column, so we can use an Array. The other thing we need to do in the HPC_Partition macro is figure out when the calculation is complete. In this case, the calculation will be complete when we’ve reached the end of the table. We can do that by incrementing the row and
column numbers as we move through the table.
Whenever we reach the end of a column, we move to the next column. When we’ve passed the last column, the table (Egress_Table) is finished and we want the function to return Null.

In our case data is defined in HPC_Partition as an array of 3 elements variant:

Dim data(2) As Variant

where:

data(0) : contains the row index of the Egress_Table
data(1) : contains the column index of the Egress_Table
data(2) : contains the value in the cell (data(0), data(1)) of the Egress_Table

So to review, the HPC_Partition macro has two purposes: it collects parameters necessary for a single calculation (here that’s the row and column number); and when the calculation is complete it signals that by returning Null.
In the function we have created it does that by stepping through the table, first by row and then by column, and when it’s passed the last column, it returns Null to indicate that the table is complete.

Figure 29: data array to set value in a cell of Egress_Table

3.11 What happens when a workbook runs on the cluster

When you click the button “Run HPC”, it calls the macro CalculateWorkbookOnCluster. That macro uses the client library to start a cluster calculation with the settings you provided for the cluster head node and share directory.

When the calculation starts, the HPC Services for Excel client code takes over.

First, the client library will create a cluster “session”, used to manage the calculation. That takes a few seconds, and after you click the button you’ll notice a few seconds’ delay.
Next the client library will call the HPC_Initialize macro. That happens on the desktop. In this workbook, that macro set the initial value of few internal counters; and empties the Egress_Table cells in the “Sheet1”. So when you click the button, first you’ll see the delay as the session is created; and then you’ll see Egress_Table cells are cleared. That’s the HPC_Initialize macro.
Next, the client library will start calling the HPC_Partition macro. This also happens on the desktop. In our code, the HPC_Partition macro increments the counter and returns the counter value. Every time this macro returns a value – until it returns Null, on the last call – the client library will send a request to the HPC scheduler.
When the scheduler receives one of these requests, it will forward the request to one of the cluster compute nodes for processing. When the compute node receives the request, it will start Excel, load the workbook, and then call the HPC_Execute macro. This is the HPC_Execute macro we have in our workbook, but when the calculation runs on the cluster, the HPC_Execute macro is actually run on the compute nodes – unlike all the other macros. That’s why we’re passing data array from one macro to another. Information flows from one macro to the next, but as data is passed from the HPC_Partition macro to the HPC_Execute macro, it is sent from the desktop, through the HPC scheduler, and to one of the compute nodes.
After one of the compute nodes runs the HPC_Execute macro, when the macro is completed the result of the calculation – the value returned from the HPC_Execute macro – is sent back to the scheduler, and then returned to the client library on the desktop. When the client library receives the calculation result, it will call the HPC_Merge macro on the desktop. So once again you can think of information flowing from one macro (HPC_Execute) to the next (HPC_Merge), but this is happening over the network: from the workbook running on the compute node to the workbook running on the desktop.

It’s important to understand that these macro function calls– each of the calls to HPC_Partition, as we send data to the cluster, and each of the calls to HPC_Merge, as data is received – happen asynchronously. That is, when a calculation is sent to the cluster the client library will not wait (or block) until that particular calculation is complete before it sends the next one. If the HPC_Partition macro is very fast, it’s possible that all calculation requests will be sent before any one of those requests is completed.

It’s also worth noting that as the results come back from the cluster – and the HPC_Merge macro is called in our workbook – they might not come back in the same order that we sent them. That’s because as each request is sent to the cluster, the scheduler will forward it to one particular compute node. The compute node will calculate the request, using the HPC_Execute macro in the workbook, and then send back the result. But for a number of reasons, one compute node might run more slowly or more quickly than another compute node; and if that happens, results might be returned in a different order.

3.12 Assign a value to the multiplication factor

In the cell B1, specify the multiplication factor used for the calculation (Figure 30)

Figure 30: multiplication factor

Now the worksheet is ready to run locally or in the Azure HPC cluster.

In Figure 31 is shown the job run in the cluster when the button “Run HPC” is pushed.

Figure 31: job executed on the HPC cluster

The HPC heatmap in the HPC management console shows CPU and cores allocation to run the Excel job.

Figure 32: HPC heat Map

Figure 33: list of successful HPC calls in the job

In Windows HPC a single job is divided into individual tasks, and those tasks are sent to the cluster for calculation. It is the responsibility of the HPC Scheduler (the Head Node) to distribute these tasks to the Compute Nodes for processing. In our case we have 286 calls (one for every Egress_Table cell) with a total of 16 tasks (each task runs one HPCServiceHost.exe). To be more efficient as possible, every task run in a single core.

Figure 34: Excel job is composed of multiple tasks

Our HPC cluster has 16 cores (eight for every compute nodes) so all the 289 tasks are served from the 16 cores

Figure 35: first task of Excel job run on first compute node EXCN001

Figure 36: second task of the Excel job run on the second compute node EXCN002

Figure 37: Allocate Nodes give the cores number allocated to run the job

The HPC cluster define automatically some environment variables to run the Excel job (Figure 38); you see the service name Microsoft.Hpc.Excel.ExcelService and the Excel workbook MulitplicationTable.xlsb copied on the shared folder (\\exhn001\share) required to run on the HPC cluster

Figure 38: environment variable to run the job