Diagnostics in V3

 When you have a cluster with a few hundred nodes, in a network environment that’s often beyond your control, running applications that you didn’t write, used by users who are not always predictable, you are expected to see some errors most of the time. Troubleshooting and fixing a cluster is hard with all these variables….

HPC Server 2008 MPI Diagnostic Fails on Eager Message No Business Card Error

An HPC Server 2008 user reported that his cluster was up and running and that all nodes could ping each other over all networks but the built-in MPI diagnostic was failing with an uninformative message “Failed To Run”. He had a topology number three with the head node connected to the Enterprise network and all…