Chronicles of a Cluster Troubleshooter Volume One #1
Date: October 5, 2007
Issue: Performance of application, Eclipse, on Windows Compute Cluster Server V1 is below that of the same application run on the same hardware on Linux.
Hardware Platform: HP Proliant DL140 cluster with Voltaire InfiniBand interconnect.
Software Platform: Windows Server 2003 Compute Cluster Edition Service Pack 2 with Compute Cluster Service Pack 1.
Application: Schlumberger Eclipse reservoir simulation package.
Initial Conditions: Cluster hardware set up, O/S and Cluster Pack installed. Application performance under Windows does not match that under Linux. Large test case fails when run on all nodes.
Investigation and Remediation:
1. Install the uSane cluster sanity test suite on the head node.
2. Run the uSane MpiHi test.
1. Runs on small case.
2. Fails immediately on testing the whole cluster
1. Mpiexec error message say that it cannot get credentials from compute node 2.
2. HP engineer notes that this is the same message that the large application generates. He had assumed this was a Windows CCS issue
3. Pause node 2 then run MpiHi across the whole network. It runs fine.
4. Replace node two on theory that the problem may be on the private network NIC or possibly the node software. Fresh image on new hardware is guaranteed to fix either issue in one service action. Post mortem shows the NIC was bad.
5. Using all nodes in the cluster run the uSane tests: cNodeLate, MPI network latency test; cNodeBand, MPI network bandwidth test; and cFlood, MPI switch flood test with all links active at one time.
1. Latencies near 10 microseconds.
2. Bandwidth near 900 MB/s with “–env MPICH_SOCKET_SBUFFER_SIZE 0” on the mpiexec command.
3. Conclude that the network is fine.
6. HP engineer adds the “–env MPICH_SOCKET_SBUFFER_SIZE 0” to the application mpiexec line. Performance still subpar.
7. Discussions with another HP engineer who ran the application on Linux, had revealed that she used the Linux utility ‘setpci’ to improve Linux performance. This utility was not used to set anything that was specific to the PCI bus or the InfiniBand HCAs, but to disable the Intel Greencreek 5000X chipset memory bus Snoop Filter feature.
8. Used the BIOS to disable Snoop Filter.
9. Reran the application. Performance was now on par with Linux.
Cores Snoop On Snoop Off Improvement
2 5490 4520 21%
4 5337 3899 37%
8 2944 2241 31%
16 1721 1298 32%
32 2919 2698 8%
10. Checked with the application developer and discovered that the test case provided was not expected to scale past 16 cores.
For an interesting article on Snoop Filter performance impact see: http://www.dell.com/downloads/global/power/ps3q06-20060362-Radhakrishnan.pdf
Until next time, good shooting.