RPC and Network Load Balancing Technologies

What is RPC?

Many applications today use Remote Procedure Calls to link client and server communications over a network. RPC allows client applications to easily perform function calls to local and network-connected server applications using native Windows APIs. Among the benefits of using RPC are:

  • Flexible communications as defined by an Interface Definition Language (IDL) file supporting a wide range of protocols to include TCP, UDP, and NetBIOS
  • Ability to automate network port selection to avoid conflicts
  • Offloading of authentication, packet integrity (digital signatures), and strong encryption
  • Ability for clients to locate remotely published server services locations through use of the RPC Endpoint Mapper

Because of these benefits (as well as many others) RPC is an attractive communications platform for software development and is used widely in client-server communications. 

Firewalls and RPC

RPC provides a large number of security benefits as discussed above, however it does have an associated cost. Many applications that use RPC leverage a feature called dynamic port allocation. When an application uses dynamic port allocation it offloads the network port number selection to the RPC server service that it is being hosted on. The result of this is that each server will use a different TCP port to host the same service. As a result, white papers specifying ports that must be opened to allow these servers to communicate across firewalls include a range of thousands of (commonly TCP) ports which must be opened. In addition, using implicit bindings means exposing any other RPC-hosted services in an accessible manner to services hosted in a different and possibly lower security zone. In the case of domain membership, this runs the risk of exposing the very core of a network’s authentication mechanism (Active Directory) to the risk of compromise due to vulnerability in a service hosted on a domain controller whose communication may not even be necessary for domain membership.

How does RPC work?

When a service uses RPC on Windows it registers with the RPC Endpoint Mapper service using one of a number of functions. There are a large number of ways that an application can register an endpoint on the endpoint mapper, many of which are listed at https://msdn.microsoft.com/en-us/library/aa375262(v=vs.80).aspx. The most important aspect of this registration when discussing load balancing is the difference between explicitly specifying the TCP port that the application uses versus allowing the application to leverage the RPC server’s dynamic port allocation capability.

Dynamic Port Allocation

Applications leveraging dynamic port allocation will use a function such as RpcServerUseProtseq(), RpcServerUseAllProtSeqs(), RpcServerUseAllProtseqsEx(), or RpcServerUseProtseqEx() to enable the RPC server to dynamically choose a TCP port to host the service on (ref: https://msdn.microsoft.com/en-us/library/aa378682(v=vs.80).aspx). When an application uses a function like this the RPC server will dynamically pick a port from a range between 1024 and 65535 by default (this behavior can be modified as outlined in the following Microsoft KB https://support.microsoft.com/kb/250367 or through changing the TCP dynamic ports as outlined in the following KB https://support.microsoft.com/default.aspx?scid=kb;EN-US;929851).

Explicit Port Allocation

If the software developer wants to explicitly specify the TCP port that they want to use they have the option of using a function such as RpcServerUseProtSeqEp() or RpcServerUseProtseqEx() to specify a specific TCP port to use for the application. After getting an endpoint from one of the above functions, the application will usually then call RpcEpRegister() to register the new port with the Endpoint Mapper. In many Microsoft applications (such as Active Directory and Exchange) this option is used only if a given registry key is configured (references – Active Directory: https://support.microsoft.com/kb/224196, Exchange 2010: https://social.technet.microsoft.com/wiki/contents/articles/configure-static-rpc-ports-on-an-exchange-2010-client-access-server.aspx). In this event, the software will try to use the given TCP port rather than leveraging the dynamic port allocation abilities of the RPC server.

How Clients Locate a RPC Service

The next logical question that comes to mind is “If the TCP port is randomly assigned, how does a client on the network find out which port to connect to?” Associating these ephemeral ports to the hosted server application is a function of the RPC Endpoint Mapper. The RPC Endpoint Mapper is essentially a phone book containing a list of mapped ports and their associated UUID (a unique ID specified by the application mapping the connection). Applications wanting to connect to a service indexed by the endpoint mapper will first connect to the hosting server on TCP port 135 using a function such as RpcStringBindingCompose() to search the server’s endpoint mapper for the serivce's UUID (ref: https://msdn.microsoft.com/en-us/library/windows/desktop/aa378481(v=vs.85).aspx). The UUID is an identity specified by the application developer during the development of the application and is commonly done through use of an Interface Definition Language (IDL) file (ref: https://pubs.opengroup.org/onlinepubs/9629399/chap4.htm). For an example of an IDL file perform a search for *.idl in your %ProgramFiles% directory. Once a client finds the location of the service it closes the connection to TCP 135 and initiates a new connection to the server on the referenced TCP endpoint.

How Load Balancers can cause Issues with RPC Traffic

When an application uses a load balancer, multiple instances of the server are housed under the same DNS name or IP address. The problem is that in a load balanced scenario without persistence the back-end server that the client is communicating with can change without notice. Once bound, the client software commonly makes the assumption that these services will remain available throughout its session with the load balanced server and may make subsequent calls directly to the ephemeral port hosting the service without contacting the RPC Endpoint Mapper.  If port allocation for this service is dynamic it is highly unlikely that multiple servers performing the same role will use the same TCP port for the same RPC service. As such, a load balancer that cannot maintain session persistence is subject to connection failures anytime the load balancer switches the client’s back-end server.

For this example we will use a network consisting of two servers (“Server A” and “Server B”) that each host a service (“Service X”) and are load balanced by a network load balancing device (“loadbalancer.mycompany.com”).

Test Network Diagram

The client connects to “loadbalancer.mycompany.com” and gets “Server A” on the other side of the load balancer. Due to RPC dynamic port assignment, “Server A” hosts “Service X” on TCP port 11111, while “Server B” hosts “Service X” on TCP port 22222. The client has an application installed that uses “Service X” over RPC and is configured to connect to it using “loadbalancer.mycompany.com”.

 At this time, the client uses RpcEpResolveBinding() to find the location of “Service X” for “loadbalancer.mycompany.com”. The load balancer forwards the connection to “Server A” which responds that it hosts “Service X” on TCP port 11111. At this time, the client calls RpcStringBindingCompose() to connect the software to “Service X” on TCP port 1111 at “loadbalancer.mycompany.com”, and finally RpcBindingFromStringBinding() to obtain a handle to this resource. When the client makes this call to the load balancer the load balancer fails to persist the client to “Server A” and instead forwards the connection to “Server B”. Since “Server B” does not have this port open it responds to the TCP SYN packet with a TCP RST-ACK (ref: https://support.microsoft.com/kb/175523). The client will continue this until it reaches TCP_MAX_RETRANSMISSIONS (ref: https://support.microsoft.com/kb/170359) at which time it determine the network connection to be failed. When the client calls RpcBindingFromStringBinding() it is returned the following error: RPC_S_INVALID_NET_ADDR (ref: https://msdn.microsoft.com/en-us/library/windows/desktop/aa378645(v=vs.85).aspx). At this time, the client must make the assumption that the service is unavailable and will invoke its error handling functions as designed.

In this configuration, this condition can happen due to multiple causes (ex. “loadbalancer.mycompany.com” fails to persist the connection to the originating server, the service on “Server A” stops or restarts causing the load balancer to move the connection, etc.). In any of these situations the client will repeat the above scenario thus resulting in connection issues.

How to Prevent this RPC Connection Issue

To effectively mitigate these issues all servers hosting one of two things must occur:

  • The RPC service must be configured to use the same explicit RPC port(s) across all servers providing the service under the load balanced name or
  • The RPC client must be developed to contact the RPC endpoint mapper in the event that it receives a RPC_S_INVALID_NET_ADDR error and the network load balancing device must be able to reasonably persist connections to the same server (we have to ensure that the TCP endpoint mapper call and the immediate following call to the RPC-hosted service are successful). Also, in the event that this persistence is not maintainable over a length of time this solution will be unnecessarily chatty.

Since most systems administrators do not have source code access to the software they are maintaining the RPC server service must be configured to utilize the same TCP port across all server services hosted under the load balanced DNS name. For example, if the server service hosted by the cluster is Exchange then all Exchange servers must be configured to use the same ports for Exchange services across all nodes in a cluster. In addition, the clustered application must be able to handle session transitions between servers to accommodate for a failover condition (i.e. in the event that the session is sent from one server to the other it must be able to recover or restart the session without reinitialization).