What Lies Beneath: Setting up underlying HPC tools

by kishi on December 21, 2006 07:34pm

This blog continues what I started writing about w/ Thinking About HPC Infrastructure and what Frank wrote in about in Overloading Clusters.

After reading thru the previous blogs on HPC, someone might ask “What are some of the core components of HPC ?”. After all, once you’ve seen the outside of a Maserati or a Pantera DeTomaso, you’re not going to be satisfied just by ogling at it. Even after a test drive, the engineer in you will want to pop the hood and see what’s inside. Taking a similar approach let’s uncover some underlying HPC technologies by looking at any basic HPC setup. Once all the provisioning has been completed, the HPC system will be physically deployed with an OS and relevant drivers, utilities etc. Yet, before the actual HPC application can get installed across, there remains a critical step in the process, i.e. configuration of cluster and file system along with any tools and interfaces such as MPI (Message Passing Interface) etc. After peeling through the HPC application layer, its worthwhile to do a “deep-dive” into what really runs the HPC clusters. A broad category of these tools are:

    • Cluster Management tools e.g. CSM
    • Job Scheduling tools e.g. SCALI, Maui
    • Resource Management tools e.g. Torque

If you’re trying to understand the “WHY” behind the existence of these tools and their importance, take a look at Cluster Management for example. Cluster configuration, installation and management can be difficult and requires intimate familiarity with the HPC hardware, OS, underlying architecture etc. Without specific tools that attend to and manage specific underlying HPC sub-components, HPC just won’t be what it is. So, it is worthwhile to understand the unique installation experience of the tools, such as the ones listed above to understand the complexity of HPC systems. Ready – let’s dive in to the installation and function of these tools:

1. SCALI: The SCALI management and MPI software packages provide deployment, monitoring and job scheduling services for a cluster.  After you deploy this software, you will be able see all the compute nodes that may have been preconfigured or are configured on your system. Scali will enable you to monitor the systems and run jobs using the SCALI graphical interface.  In order to license the SCALI software, you must utilize the scainstall command to produce a license request file.  This file can then be sent to SCALI to receive a permanent key. For those that need some hand-holding through this, luckily SCALI provides very comprehensive documentation on their website.  A large portion of the SCALI Manage User’s Guide is dedicated to pre-setup planning and configuration of the cluster and the network.  The documentation provides detailed recommendations about how you can set up their Ethernet-based network environment and out-of-band management network.  The documentation also provides a general overview about how to install and configure higher performance interconnects, including bonded Ethernet, Infiniband, Myrinet and SCI. The SCALI Manage interface provides simple tools to assist in configuring and testing DET, Infiniband, and Myrinet devices for use with the SCALI MPI implementation.  The SCALI MPI software supports multiple Infiniband stacks including Mellanox, Topspin, Voltaire and Infinicon.

2. HP-MPI: HP-MPI is Hewlett-Packard’s Linux-based implementation of the Message Passing Interface (MPI).  Many of the utilities distributed with HP-MPI are similar to other common MPI utilities such as MPICH - e.g. mpicc, mpirun, etc. In order to utilize the HP-MPI software, a license is required for each CPU core in the cluster.  To obtain a license file you are required to obtain the MAC address from each node (typically eth0) and input that information into a form at licensing.hp.com.  The resulting file can then be copied to the compute node. The HP-MPI software is non-functional until licensing files are generated for the nodes

3. CSM (Cluster Systems Management): The CSM software suite is designed to automate the deployment and management of cluster nodes.  Nodes can be remotely installed with an operating system as well as the CSM software for later monitoring.  The CSM software supports RedHat and Novell on multiple platforms.  In order to obtain and install the CSM software one must register with IBM’s website and download the required RPMs. In order to configure CSM, it can remotely install the operating system and/or the CSM software on the compute nodes.  Much like Platform ROCKS, CSM makes use of PXE functionality and RedHat’s kickstart or the autoyast software to remotely install the operating system. The CSM software provides multiple methods for defining the nodes that should be deployed and managed:

a. The first method involves creating a hostname mapping (hostmap) file, which is a colon-delimited file that defines a number of attributes of each node
b. The second method also involves manually creating and editing a “node definition” (nodedef) file.  This is the method suggested by the documentation for use with small clusters

Proper remote power and remote console capabilities greatly ease the administration and deployment of the compute nodes, however according to the CSM FAQ remote power management is not absolutely required. All the compute nodes must be rebooted (remotely or manually).  They are then PXE booted and installed with RHEL4 using the kickstart installation system.

4. Maui and Torque: Both Torque and Maui are free software which must be compiled from the source distribution on the head node.  Maui is an open-source job scheduler for compute clusters.  It supports a number of task management features not found in other parallel batch processing software including policy-based scheduling and prioritization of tasks. Torque is an open-source resource manager for managing compute nodes and scheduled jobs.  It can integrate with Maui to provide additional features for scheduling and managing scheduled tasks.  Installation of Torque can be done using the guidance available in the Torque 2.0 Admin Manual .

5. Platform Rocks: Platform Rocks is a cluster deployment software that facilitates the deployment of various software stacks (“rolls”) onto the compute nodes.  The software is capable of deploying the base operating system and utilities required for cluster administration, management and scheduling.  The software can also manage configuration and updates to ensure consistency throughout the cluster. Platform Rocks is a suite of utilities that are packaged together as separate installable rolls.  One of the main goals of the software is to allow for easy installation and integration of third-party rolls and applications.  One unique aspect to the Platform Rocks installation approach is that the software installs an operating system on the head node, and also installs all the required rolls at the same time.  The software can also automatically set up the subsystem required to install an operating system and other packages on the compute nodes (such as management agents, etc).

That about does it for a quick “deep-dive”. Let me insert a gentle reminder that these are not the only cluster or resource management technologies out there in the HPC space but rather the ones most prevalent. If you have additional tools that you have worked with, we’d like to hear from you and thank you for tuning in to Port 25. HAPPY HOLIDAYS!