Low-latency links are important for message-passing (MPI) applications. Typically several instances of the same MPI process will run on different nodes and depend on data passed from other nodes to complete their computation. Latency is therefore one of the performance-limiting factors to be taken into account. The latency of GbE tends to be of the order of 100 microseconds or more. Also, GbE makes relatively efficient use of its available bandwidth with a few, large packets. For some parallel applications that rely on lots of small (512B – 4 KB typically) messages, that is not acceptable. So, what are the alternatives?
- 4X SDR is the most common implementation today. This will give you 8 Gb/s of useful bandwidth.
- Data travels in packets up to 4 KB in size. 1 or more packets make up a message.
- 3-5 microseconds of message latency from application to application buffer are possible.
- IPoIB (IP over InfiniBand) enables standard TCP/IP applications to run unmodified.
- SDP (Socket Direct Protocol) or WSD (Winsock Direct): reduces latency in software stack by using RDMA, reliable connection handling and offloading to the channel adapters.
- SRP (SCSI RDMA Protocol): allows block storage operations over infiniband, taking advantage of the low latency and high bandwidth. IB / Fibre Channel gateways are often required to access the storage device.
- uDAPL (Direct Access Programming Library): API specification for all RDMA-capable transports. Used for instance by Oracle RAC.
Some implementations of MPI (e.g. MVAPICH2) use uDAPL, mVAPI (Mellanox’s verbs API) or OpenFabrics Gen2 VAPI to communicate with the Infiniband adapters (or better with their drivers, which can do most of the processing in user mode). Thus, they afford more efficient use of resources than a simple socket-based implementation (e.g. “vanilla” MPICH2 or MSMPI).
Familiarize yourselves with the OpenIB stack at http://www.openfabrics.org
Check out MVAPICH at http://mvapich.cse.ohio-state.edu/overview/mvapich2/
Read about IB applications on http://www.mellanox.com/support/whitepapers.php
Read more about infiniband support on our performance tuning whitepaper at http://www.microsoft.com/windowsserver2003/ccs/technology.aspx