I've recently been involved in a simple benchmarking exercise. Here are a few quick "rules of thumb" that have helped me:
A PCIe 4x slot is supposed to have 4 lanes capable of 250 MB/s each, for a total of 1 GB/s. An Infiniband SDR 4x card has 4 channels clocked at 2.5Gb/s, so a simple rule of thumb is: put an Infiniband card in the PCIe slot with the same number of channels. This is not a coincidence: Intel was part of the original Infiniband group.
Be aware that not all motherboards are equal, although in theory most of them use the same chipsets. In our case, we found out that the motherboard was not able to sustain more than about 600 MB/s on the PCIe 4x slot. We had to move the Infiniband cards to the 8x slots, where we could reach the expected 900 MB/s transfer rate of the card. The 8x slot on those motherboards is probably not capable of reaching its top speed either, but it is sufficient for the SDR 4x card.
- Snoop Filters
A snoop filter is a mechanism to reduce traffic between different memory bus segments. It is particularly useful in multi-cpu, multi-core machines. Applications generally benefit from it, but there are some cases where latency-bound applications are adversely affected. If you see erratic behaviours in your latency tests (e.g. "random" high latencies in an otherwise consistent benchmark) and you have quad-core machines (especially early Clovertowns), try and disable the snoop filter in the bios. It may (or may not) help. Again, motherboards affect the results, as different components (with or without snoop filter) were used by different manufacturers.
New quad-core machines (Harpertown) have a snoop filter, but do not seem to show the symptoms mentioned above (at least those I've seen).
- Dynamic Power Management
It is generally NOT a good idea when you're trying to squeeze the last FLOP out of the CPUs. Disable it in the BIOS.
- MPI traffic
You may want to make absolutely sure that your MPI applications are using Infiniband; or you may want to run them once on Ethernet and another time on Infiniband, then compare the results. In any case, you can specify the network where MPI traffic will go at run time:
mpiexec -env MPICH_NETMASK <address>/<mask> <other parameters> <exe>
You may also want to make absolutely sure that your MPI traffic uses Network Direct, not winsock. You can:
- remove the Winsock provider. Coarse, but effective:
clusrun /<nodes> installsp -r
- run your application with
mpiexec -env MPICH_DISABLE_SOCK 1 <other parameters> <exe>
Incidentally, you can install the Network direct provider with
clusrun /<nodes> ndinstall -i