Did you notice that the latest CTP has introduced a new option for mpiexec? Using mpiexec -affinity you can affinitize the mpi rank to the core where it is started, thus avoiding context switches. Your application will determine whether you actually benefit from affinitization or not. Some of them show a good performance improvement, some do not. In particular, if you have an MPI application that is also multi-threaded, the affinity option may backfire, because the affinity mask that you set for the process is inherited by default by all its threads. Thus, its threads may be stuck on 1 core. Windows offers other API calls to set thread affinity.
"Traditional", non multi-threaded MPI applications may be more straightforward. One important factor to take into account when deciding when to affinitize the process is the compute node architecture: is it NUMA or not? If it is, have you got enough RAM in the memory bank local to the core where the process will run? If not, you may incur frequent (and lengthy) remote memory accesses on the same hardware. In this case, it may be best to rely on the o/s scheduler to determine the ideal NUMA node for the thread.
Powered by Qumana