Accelerating Excel by Parallelization

Note that Excel 2007 is able to use multiple threads to parallelize calculations on the workstation. Previous versions were not. When you hit “recalculate”, Excel will identify independent areas in your spreadsheet and allocate computations in those to separate threads. There are as many calculation threads as cores in your machine. So, before thinking of a cluster, think of a better workstation. For more details, see: 

https://msdn2.microsoft.com/en-us/library/aa730921.aspx#office2007excelperf_ExcelPerformanceImprovements

Excel Services (https://msdn2.microsoft.com/en-us/library/ms582023.aspx), a function of Office Sharepoint Server Enterprise, is also multi-threaded, but it uses threads differently from Excel 2007. It does not divide a single spreadsheet across threads. Instead, it uses multiple threads to run several requests (e.g. from different users) against the same spreadsheet at the same time. Similarly, if one user or application opens several spreadsheets, each of them will be handled by a different thread on the server. Hence, Excel Services by itself may not make your calculation any faster. It is mostly intended as a governance solution.

If the power of a multi-core workstation is not sufficient, we can envisage two scenarios:

  1. Excel offloads to Compute Cluster directly: In this case, you have to write:
    1. A free-standing application (in whatever language) that performs the computation you need. You deploy its executable onto all the nodes in the cluster.
    2. An integration component (can be a VBA macro in Excel or an external add-in) that:
      1. Identifies groups of cells in your spreadsheet where the computation in question can run independently. Ideally each group should consume the same amount of computing power.
      2. Creates a job on the cluster containing as many tasks as the groups it identified. Each task will run an instance of the executable against the group of cells that is passed as a parameter.
      3. Reads the results and stores them back into the spreadsheet for further analysis.
  2. Excel offloads to Excel Services running on Compute Cluster: In this case, you will need:
    1. A custom integration component (Excel add-in) that performs a similar function to (1.2).
    2. A custom executable run in each task that acts as a command-line client of excel services on the compute node. It passes the input cell ranges for calculation and specifies where to store the output. It does not perform any calculation itself.
    3. A storage location (typically a sharepoint site) where the spreadsheet is published for all nodes to access.
    4. An instance of Excel Services running on each node of the cluster. This will perform the calculations as directed by (2.2).

There are advantages and disadvantages in each case:

  • Option 1
    • It is fast: nothing beats dear old C or Fortran for pure computational speed.
    • It is also relatively inexpensive: all you need is Excel, a compiler and Compute Cluster Server. The COM API in the latter can be used by Excel macros.
    • Alas, it is limited to those calculations that you have coded. If you need to change them, you will have to create a new executable. If you have an external library of functions, however, you may obviate the problem by passing both the input parameters and a reference to the function.
  • Option 2:
    • It is flexible: The formulae are specified in your spreadsheet. If you need to change them, edit the spreadsheet and publish it again to Excel Services.
    • Alas, you pay for flexibility with speed: it is not as fast as a native executable. In fact, there is a significant overhead in setting up the computation job on the cluster and retrieving the results.
    • If you use external libraries, they must run as .NET managed code, which Excel Services can invoke. Not many do.
    • It is more expensive than (1): you’ll need a licence for Excel Services on each node, Excel and Compute Cluster Server. You may recoup some of those costs because you need not develop as much bespoke code.

The idea of offloading Excel computations is not new; several players are well established in the financial market with such a solution:

-Platform Symphony (www.platform.com) with its Excel adapter (APIs and libraries) has been on the market for a few years and is well accepted, especially in the US market.

-Datasynapse (www.datasynapse.com) has a similar offering for its grid software. It seems to be prevalent in European markets.

-There are also a few research efforts. The articles are worth reading, as they explain how offloading from Excel to grids or clusters can be achieved: https://www.gridbus.org/papers/ExcelGrid.pdf, https://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&arnumber=1324012&isnumber=29307.

Get started

-You can download some sample integration code and set up a demonstration environment with Excel, Excel Services and Compute Cluster. Instructions and material have been posted on:

https://www.microsoft.com/downloads/details.aspx?FamilyID=513D97DA-34B7-4346-A9E5-B8998170AADE&displaylang=en

-Read a detailed explanation of how the sample works and a step-by-step installation guide on https://msdn2.microsoft.com/en-gb/library/bb462928.aspx