Launching MS-MPI Applications with Third Party Schedulers

MPI applications linked with the MS-MPI library are normally launched by using the job scheduler in Microsoft HPC Pack. Starting with the 2012 R2 release of MS-MPI, it is possible to launch MPI applications from any 3rd party scheduler by using the Microsoft MPI Process Management System (MSPMS) interface. This blog posting provides a quick overview of the interface and an example for how to use it to authorize and launch MPI applications.

I. Overview of launching MS-MPI applications by using the MSPMS interface

At a high level, launching an MS-MPI application by using the MSPMS interface works as follows:

  • A list of hosts where the MPI application will be launched is provided to mpiexec. These hosts need to implement a launch service by using the MSPMS interface to listen for connections. The service implementing MSPMS will be referred to as an MSPMS client.
  • mpiexec obtains the security token of the user launching the MPI application and passes it to the MSPMS client on each host.
  • The MSPMS client verifies the user’s permission to launch the MPI application.
  • If the user is authorized to launch the MPI application, the MSPMS clients launch the MPI process managers (one for each host, per application launch).
  • The process managers launch the MPΙ processes.

II. How to use the MSPMS interface

At a minimum, an MSPMS client is required to do the following:

  • Load the MSPMS provider.
  • Register callbacks to authorize and launch processes.
  • Call the MSPMS interface to start (listening) and stop the MSPMS client service.

We will be using a sample service to illustrate the functionality of the MSPMS interface. The full sample code is available at the end of this article.

The code in this sample service starts by loading the MSPMS provider. Starting with the 2012 R2 release of MS-MPI, the MS-MPI installation process stores the location of the default MSPMS provider in the following registry key: HKLM\Software\Microsoft\MPI\MSPMSProvider

Note: For simplicity, the following code snippets usually do not contain the necessary error checking.

     HKEY    hKey;
    WCHAR   path[MAX_PATH];
    DWORD   cchPath = MAX_PATH;
 
    RegOpenKeyExW(
        HKEY_LOCAL_MACHINE,
        L"Software\\Microsoft\\MPI",
        0,
        KEY_READ,
        &hKey
        );
 
    RegGetValueW(
        hKey,
        NULL,
        L"MSPMSProvider",
        RRF_RT_REG_SZ,
        NULL,
        reinterpret_cast<void*>(path),
        &cchPath
        );
    RegCloseKey( hKey );

    g_MSPMSProvider = LoadLibraryW( path );

After successfully loading the provider, the client code uses GetProcAddress to retrieve the export function MSMPI_Get_pm_interface.

     GetPmInterface = reinterpret_cast<PFN_MSMPI_Get_pm_interface>(
        GetProcAddress( g_MSPMSProvider, "MSMPI_Get_pm_interface" ));
    if( GetPmInterface == nullptr )
    {
        DWORD gle = GetLastError();
        FreeLibrary( g_MSPMSProvider );
        g_MSPMSProvider = nullptr;
        return HRESULT_FROM_WIN32( gle );
    }

This function can then be called to retrieve an instance of the interface for a specific version.

Note: As of this writing, PM_SERVICE_INTERFACE_V1 is the only supported version.

     g_PMIServiceInterface.Size = sizeof(PmiServiceInterface);
    hr = GetPmInterface(
        PM_SERVICE_INTERFACE_V1,
        &g_PMIServiceInterface
        );
    if( FAILED( hr ) )
    {
        ...
    }

After obtaining the process management interface, the client code initializes the interface by calling Initialize and passing in an initialized PmiServiceInitData structure, as shown below. Note that the Name field can be initialized to any name that you choose.

     PmiServiceInitData initData;
    initData.Size = sizeof(PmiServiceInitData);
    initData.Name = "msmpi service";
    hr = g_PMIServiceInterface.Initialize( &initData );
    if( FAILED( hr ) )
    {
        ...
    }

After initialization, the MSPMS client should populate a SOCKADDR_INET and a PmiManagerInterface structure, and pass them to the Listen call. The SOCKADDR_INET structure specifies the address and port to use for listening for connections from mpiexec. Note that the current MSPMS interface only supports IPV4 addresses.

 The PmiManagerInterface structure is used to configure the launch parameters, allowing the MSPMS client to specify the security context requirement for its launch callback. Currently, the following launch types are supported:

  • PmiLaunchTypeSelf: The launch callback is invoked without any impersonation or user context.  This is used to launch processes under the same credentials as the MSPMS client.
  • PmiLaunchTypeImpersonate: The launch callback is invoked under the impersonated security context of the user that ran mpiexec.
  • PmiLaunchTypeUserSid: The launch callback is invoked with the user identifier of the user that ran mpiexec.

In our sample code, we choose PmiLaunchTypeImpersonate, which is the recommended type for most common scenarios. PmiLaunchTypeUserSid provides flexibility to support custom security models in which the identity of the caller is acquired through external means (e.g., a database).

     SOCKADDR_INET addr;
    addr.si_family = AF_INET;
    addr.Ipv4.sin_port = _byteswap_ushort( g_Port );
    addr.Ipv4.sin_addr.S_un.S_addr = INADDR_ANY;

    PmiManagerInterface manager;
    manager.Size = sizeof(PmiManagerInterface);
    manager.LaunchType = PmiLaunchTypeImpersonate;
    manager.Launch.Impersonate = SvcCreateManagerProcess;
 
    hr = g_PMIServiceInterface.Listen(
        &addr,
        &manager,
        PM_MANAGER_INTERFACE_V1
        );
 
    g_PMIServiceInterface.Finalize();
    if( FAILED( hr ) )
    {
        ...
    }

Note that the MSPMS client should call Finalize regardless of whether the Listen call succeeded or not. This ensures proper cleanup of the internal data structures that were created upon the invocation of Initialize.

III. Best practices when launching MS-MPI applications by using the MSPMS interface

  • The sample code in this posting does not address the issue of cleaning up processes since this responsibility lies with the resource manager. A good practice would be to have job objects that are associated with the context string passed in as argument to the launch callback.  This will ensure proper cleanup of the processes spawned within a job.
  • The launch callback should always authorize the users before launching the process managers.
  • Affinity settings on job objects can help control or limit the cores available to MPI jobs

Full source code of the sample service:

 /*
Copyright (c) 2014  Microsoft Corporation
*/
 
#include <initguid.h>
#include "mspms.h"
#include <strsafe.h>
 
//
// Function Prototypes
//
VOID WINAPI ServiceMain( DWORD argc, LPWSTR* argv );
VOID WINAPI SvcHandler( DWORD fdwControl );
DWORD SvcReportState( DWORD State );
DWORD SvcSetStatus();
DWORD SvcEnableControls( DWORD Controls );
DWORD VerifyLaunchAccess( HANDLE hToken );
HRESULT LoadPms();
VOID UnloadPms();
VOID ParseSvcOpt( int argc, LPWSTR* argv );
HRESULT WINAPI SvcCreateManagerProcess( PCSTR app, PSTR args, PCSTR context );
 
 
static SERVICE_STATUS g_ServiceStatus =
{
    SERVICE_WIN32_OWN_PROCESS,  // dwServiceType;
    SERVICE_START_PENDING,      // dwCurrentState;
    0,                          // dwControlsAccepted;
    0,                          // dwWin32ExitCode;
    0,                          // dwServiceSpecificExitCode;
    0,                          // dwCheckPoint;
    0,                          // dwWaitHint;
};
 
 
static SERVICE_STATUS_HANDLE g_StatusHandle = nullptr;
static SERVICE_TABLE_ENTRYW  g_DispatchTable[2] =
{
    {L"Sample MSMPI Launcher", ServiceMain},
    {nullptr, nullptr},
};
 
 
typedef
HRESULT
( WINAPI * PFN_MSMPI_Get_pm_interface )(
REFGUID              RequestedVersion,
PmiServiceInterface* Interface
);
 
 
static HMODULE                      g_MSPMSProvider = nullptr;
static PFN_MSMPI_Get_pm_interface   GetPmInterface = nullptr;
static PmiServiceInterface          g_PMIServiceInterface;
static USHORT                       g_Port = 8677;
static const wchar_t                SERVICE_NAME[] = L"MSMPILauncher";
 
 
int
wmain(
    int argc,
    LPWSTR* argv
    )
{
    int rc = 0;
 
    BOOL fSucc = StartServiceCtrlDispatcherW( g_DispatchTable );
    if( !fSucc )
    {
        rc = GetLastError();
    }
    return rc;
}
 
 
VOID WINAPI
ServiceMain(
    DWORD argc,
    LPWSTR* argv
    )
{
    g_StatusHandle = RegisterServiceCtrlHandlerW(
        g_DispatchTable[0].lpServiceName,
        SvcHandler
        );
    if( g_StatusHandle == nullptr )
    {
        return;
    }
 
    ParseSvcOpt( argc, argv );
 
    HRESULT hr = LoadPms();
    if( FAILED( hr ) )
    {
        goto CleanUp;
    }
 
    g_PMIServiceInterface.Size = sizeof(PmiServiceInterface);
 
    hr = GetPmInterface(
        PM_SERVICE_INTERFACE_V1,
        &g_PMIServiceInterface
        );
 
    if( FAILED( hr ) )
    {
        goto UnLoadAndCleanUp;
    }
 
    DWORD err = SvcReportState( SERVICE_RUNNING );
    if( err != 0 )
    {
        hr = HRESULT_FROM_WIN32( err );
        goto UnLoadAndCleanUp;
    }
    SvcEnableControls( SERVICE_ACCEPT_STOP );
 
    SOCKADDR_INET addr;
    addr.si_family = AF_INET;
    addr.Ipv4.sin_port = _byteswap_ushort( g_Port );
    addr.Ipv4.sin_addr.S_un.S_addr = INADDR_ANY;
 
    PmiServiceInitData initData;
    initData.Size = sizeof(PmiServiceInitData);
    initData.Name = "msmpi service";
 
    hr = g_PMIServiceInterface.Initialize( &initData );
 
    if( FAILED( hr ) )
    {
        goto UnLoadAndCleanUp;
    }
 
    PmiManagerInterface manager;
    manager.Size = sizeof(PmiManagerInterface);
    manager.LaunchType = PmiLaunchTypeImpersonate;
    manager.Launch.Impersonate = SvcCreateManagerProcess;
 
    hr = g_PMIServiceInterface.Listen(
        &addr,
        &manager,
        PM_MANAGER_INTERFACE_V1
        );
 
    //
    // Finalize is necessary regardless of the status of the Listen
    // call. This ensure proper clean up of any internal data
    // structure that might have happened during Initialize.
    //
    g_PMIServiceInterface.Finalize();
 
UnLoadAndCleanUp:
    UnloadPms();
 
CleanUp:
    if( FAILED( hr ) )
    {
        g_ServiceStatus.dwWin32ExitCode = ERROR_SERVICE_SPECIFIC_ERROR;
        g_ServiceStatus.dwServiceSpecificExitCode = hr;
        SvcReportState( SERVICE_STOPPED );
    }
}
 
 
HRESULT
WINAPI
SvcCreateManagerProcess(
    PCSTR app,
    PSTR args,
    PCSTR /* context */
    )
{
    //
    // We're now in impersonation mode, get the access token associated with the user
    //
    HANDLE hThreadToken;
    BOOL fSucc = OpenThreadToken(
        GetCurrentThread(),
        TOKEN_ALL_ACCESS,
        TRUE,
        &hThreadToken
        );
    if( !fSucc )
    {
        return HRESULT_FROM_WIN32( GetLastError() );
    }
 
    //
    // Verify if the user is allowed to launch MPI jobs
    //
    DWORD ret = VerifyLaunchAccess( hThreadToken );
    if( ret != 0 )
    {
        CloseHandle( hThreadToken );
        return HRESULT_FROM_WIN32( ret );
    }
 
    //
    // Create a primary token because CreateProcessAsUser does not
    // allow impersonation token
    //
    HANDLE hPrimaryToken;
    fSucc = DuplicateTokenEx(
        hThreadToken,
        TOKEN_ALL_ACCESS,
        nullptr,
        SecurityImpersonation,
        TokenPrimary,
        &hPrimaryToken
        );
    if( !fSucc )
    {
        CloseHandle( hThreadToken );
        return HRESULT_FROM_WIN32( GetLastError() );
    }
 
    CloseHandle( hThreadToken );
 
    STARTUPINFOA si;
    GetStartupInfoA( &si );
 
    PROCESS_INFORMATION pi;
    //
    // We use the ANSI version because the MSPMS v1 interface only
    // supports launching with ANSΙ manager name and ANSI arguments.
    //
    fSucc = CreateProcessAsUserA(
        hPrimaryToken,
        app,                // Application Name
        args,               // Command Line
        NULL,               // Process Security Attributes,
        NULL,               // Thread Security Attributes,
        TRUE,               // Inherit Parent Handles,
        CREATE_NO_WINDOW,   // Process CreationFlags,
        NULL,               // lpEnvironment,
        NULL,               // lpCurrentDirectory,
        &si,                // lpStartupInfo,
        &pi                 // lpProcessInformation
        );
 
    if( !fSucc )
    {
        CloseHandle( hPrimaryToken );
        return HRESULT_FROM_WIN32( GetLastError() );
    }
 
    CloseHandle( hPrimaryToken );
    CloseHandle( pi.hThread );
    CloseHandle( pi.hProcess );
 
    return S_OK;
}
 
 
static DWORD
VerifyLaunchAccess(
    HANDLE hToken
    )
{
    //
    // This is where we verify the user's eligibility to run MPI jobs
    // For this example, we will use a simple check of whether the
    // user belongs to the local administrator group.
    //
    // One of the best practices for launching is to always authorize
    // the user, even if the authorization check is simple.
    //
    SID_IDENTIFIER_AUTHORITY NtAuthority = SECURITY_NT_AUTHORITY;
    PSID pAdministratorsGroupSid;
    BOOL fSucc = AllocateAndInitializeSid(
        &NtAuthority,
        2,
        SECURITY_BUILTIN_DOMAIN_RID,
        DOMAIN_ALIAS_RID_ADMINS,
        0, 0, 0, 0, 0, 0,
        &pAdministratorsGroupSid
        );
    if( !fSucc )
    {
        return GetLastError();
    }
 
    BOOL isAdmin;
    fSucc = CheckTokenMembership( hToken, pAdministratorsGroupSid, &isAdmin );
    FreeSid( pAdministratorsGroupSid );
    if( !fSucc )
    {
        return GetLastError();
    }
 
    if( isAdmin == TRUE )
    {
        return 0;
    }
 
    return ERROR_ACCOUNT_RESTRICTION;
}
 
 
static DWORD
SvcReportState(
    DWORD State
    )
{
    if( State == SERVICE_STOPPED || State == SERVICE_START_PENDING )
    {
        g_ServiceStatus.dwControlsAccepted = 0;
    }
 
    g_ServiceStatus.dwCurrentState = State;
    g_ServiceStatus.dwCheckPoint = 0;
    g_ServiceStatus.dwWaitHint = 0;
    return SvcSetStatus();
}
 
 
static DWORD
SvcSetStatus()
{
    if( SetServiceStatus( g_StatusHandle, &g_ServiceStatus ) == TRUE )
    {
        return 0;
    }
 
    return GetLastError();
}
 
 
static DWORD
SvcEnableControls(
    DWORD Controls
    )
{
    g_ServiceStatus.dwControlsAccepted |= Controls;
    g_ServiceStatus.dwCheckPoint = 0;
    g_ServiceStatus.dwWaitHint = 0;
    return SvcSetStatus();
}
 
 
VOID WINAPI
SvcHandler(
    DWORD fdwControl
    )
{
    switch( fdwControl )
    {
    case SERVICE_CONTROL_STOP:
        g_ServiceStatus.dwControlsAccepted = 0;
        SvcReportState( SERVICE_STOP_PENDING );
 
        //
        // Call the function from msmpi.dll to stop the listener
        //
        g_PMIServiceInterface.PostStop();
 
        UnloadPms();
 
        SvcReportState( SERVICE_STOPPED );
        break;
    default:
        break;
    }
}
 
 
static VOID
ParseSvcOpt(
    int argc,
    LPWSTR* argv
    )
{
    //
    // Use argc because the service controller does not set argv[argc] to NULL.
    //
 
    if( argc < 3 )
    {
        return;
    }
 
    if( argv[1][0] != L'-' && argv[1][0] != L'/' )
    {
        return;
    }
 
    if( wcscmp( &argv[1][1], L"p" ) != 0 &&
        wcscmp( &argv[1][1], L"port" ) != 0 )
    {
        return;
    }
 
    int port = _wtoi( argv[2] );
    if( port <= 0 || port > USHRT_MAX )
    {
        return;
    }
 
    g_Port = static_cast<USHORT>(port);
}
 
 
static HRESULT
LoadPms()
{
    HKEY    hKey;
    WCHAR   path[MAX_PATH];
    DWORD   cchPath = MAX_PATH;
 
    LONG err = RegOpenKeyExW(
        HKEY_LOCAL_MACHINE,
        L"Software\\Microsoft\\MPI",
        0,
        KEY_READ,
        &hKey
        );
    if( err != ERROR_SUCCESS )
    {
        return HRESULT_FROM_WIN32( err );
    }
 
    err = RegGetValueW(
        hKey,
        NULL,
        L"MSPMSProvider",
        RRF_RT_REG_SZ,
        NULL,
        reinterpret_cast<void*>(path),
        &cchPath
        );
    RegCloseKey( hKey );
 
    if( err != ERROR_SUCCESS )
    {
        return HRESULT_FROM_WIN32( err );
    }
 
    g_MSPMSProvider = LoadLibraryW( path );
    if( g_MSPMSProvider == nullptr )
    {
        return HRESULT_FROM_WIN32( GetLastError() );
    }
 
    GetPmInterface = reinterpret_cast<PFN_MSMPI_Get_pm_interface>(
        GetProcAddress( g_MSPMSProvider, "MSMPI_Get_pm_interface" ));
    if( GetPmInterface == nullptr )
    {
        DWORD gle = GetLastError();
        FreeLibrary( g_MSPMSProvider );
        g_MSPMSProvider = nullptr;
        return HRESULT_FROM_WIN32( gle );
    }
 
    return S_OK;
}
 
 
static VOID
UnloadPms()
{
    GetPmInterface = nullptr;
    if( g_MSPMSProvider != nullptr )
    {
        FreeLibrary( g_MSPMSProvider );
        g_MSPMSProvider = nullptr;
    }
}