I happen to be part of an e-mail thread this week with several members of the product group and we were discussing the need for publicly routable IP address on the external interface of the A/V edge server. I wanted to share with you the information that Alan Shen, Program Manager involved with this technology, shared with us (this is some of the best stuff I've seen fully explaining everything):
The A/V edge server enables users to participate in audio and video connections from outside the corporate network, such as a point to point call, a conference, leaving a voicemail with Exchange UM, or making a PSTN call. Contoso has deployed the A/V Edge server with two NICs in the perimeter network. The “external” firewall separates the edge server from the Internet and the “internal” firewall separates the server from the corporate network. In order for the A/V Edge server to function correctly, the internal firewall must allow traffic to UDP 3478, TCP 443, and TCP 5062 (A/V authentication port). And the external firewall must allow bi-directional traffic to the following ports: UDP 3478, TCP 443, UDP 50,000-59,999, and TCP 50,000-59,999. No NATing behavior is allowed on either firewall. The external IP address must be publicly routable and the internal IP address must be routable from within the corporate network.
The ports on the external edge tend to undergo greater scrutiny because they involve more ports open to the Internet. This sidebar first explains why are there are so many publicly addressable ports and then how these ports are secured from an attack.
Why the A/V Edge has so many ports
Needing UDP ports
UDP connections are more resilient to packet loss than TCP. When a UDP packet is lost, the transport delivers subsequent packets without delay. When a TCP packet is lost, the transport holds all subsequent packets because TCP inherently must provide a reliable stream of data. This results in increased audio latency as we wait for the lost packet to retransmit and the rest of the TCP stream to "catch up".
Needing TCP ports
Although UDP is a more efficient transport, some clients can only reach the Internet via TCP, typically due to a corporate firewall policy. OCS also supports a TCP media transport in case a UDP path is not available. At the start of each call or conference, the two endpoints use the IETF's ICE protocol to dynamically choose the optimal media path available. This protocol prefers direct media paths over those that go through a media relay, and UDP paths over TCP paths.
Needing the port range at 50,000
The A/V Edge server is an implementation of the IETF's STUN protocol with TURN relay extensions. The standard requires this port range because it cannot assume the remote party has access to the same media relay server. Phone calls often traverse company boundaries, such as a federated VOIP call in OCS2007. Calls to standalone SIP devices are another example that one could envision as VOIP technology continues to evolve. The federated company cannot access the local company’s A/V Edge server via UDP3478/TCP443. The 50,000 port range allows media to traverse in a federated call. It is a port range instead of a multiplexed port to enable efficient relaying of RTP packets. A multiplexed port would require increased packet inspection and lowered efficiency of the server. As you’ll see below, the port range also increases the security of the A/V Edge Server.
Needing a publicly routable IP address on the external interface
The external A/V Edge requires a publicly routable IP address for several reasons. First, the A/V Edge server implements the STUN protocol, a mechanism whereby the A/V Edge server reflects back the IP address it saw from a user’s home router. This home router IP address is used to enable the use of efficient media paths using the ICE protocol and is also needed to ensure proper IP permissions are set on the A/V Edge server’s 50,000 port range. If the A/V Edge external address was behind a NATed IP, the A/V edge server would return that address instead of the address of the home router, leading to less efficient (sometimes broken) media paths and permission issues on the 50,000 port range. A second reason for publicly routable IPs is to support UDP load balancing. For real time audio/video traffic, UDP is the preferred protocol to transfer RTP packets. However, UDP is a stateless protocol, so some load balancers distribute UDP packets to the servers without any context for the current session. To mitigate this, the A/V edge server returns its external IP address on the first UDP packet of a media session, and OC or the Meeting Console client sends subsequent UDP traffic directly to that IP address instead of through the load balancer. In order for this mechanism to work, the external IP must be publicly routable. Note that supporting a publicly routable IP address on the external edge does not preclude a company from using a firewall. To the contrary, Microsoft recommends that all externally facing servers be protected with a firewall…provided that firewall does not NAT the IP address.
Needing a routable IP address on the internal interface
For the same reason of needing to support UDP media across load balancers, the A/V edge server returns its internal IP address on the first UDP packet of a media session, and OC or the Meeting Console client sends subsequent UDP traffic directly to that IP address instead of through the load balancer. That is the reason why the internal IP address needs to be routable from the corporate network. And to be specific, this internal IP address needs to be routable by client endpoints (OC/Meeting Console) as well as server endpoints (Mediation Server/AVMCU/ExchangeUM), given that OCS 2007 supports media point to point and via a conference.
Understanding the technology is not enough, though. Like most corporations, Contoso’s IT department is composed of emerging technology and network security engineers. Deploying the technology described above will only happen if it passes a security review. The following section discusses security aspects, first providing a summary of the mechanisms in place along with a more detailed description afterward.
Security of A/V Edge Server Auth Port TCP5062 (internal edge only)
OCS front end servers must provide a validly signed certificate whose subject name matches the FQDN of that server. (The OCS front end server performs the same check against the A/V Edge Server’s certificate.)
The OCS front end server FQDN must be on a trusted list of the A/V Edge Server. (The OCS front end server performs the same check against the A/V Edge Server FQDN.)
All SIP signaling is protected with 128-bit TLS encryption.
Security of UDP3478/TCP443(internal and external edges)
Port allocation is protected by 128-bit digest “challenge” authentication, using a computer generated password that rotates every 8 hours.
A sequence number and random nonce are used to deter replay attacks.
Media relay packaged messages (UDP3478/TCP443) is protected with a 128-bit HMAC signature.
Security of UDP/TCP 50,000-59,999 (external edge only)
Ports are allocated randomly within that range per call. An attacker needs to predict which port is active and complete an attack before the call ends.
Incoming traffic is filtered according to the IP addresses of the other endpoint’s candidates. Even if an attack finds a port in use, it must also spoof the correct IP address.
These two examples actually make the port range more secure. If all traffic was multiplexed through one port, it would accept traffic from IP addresses of all remote endpoints.
Security of end to end media
Media packets are protected with end to end SRTP, preventing any eavesdropping or packet injection.
The key used to encrypt and decrypt the media stream is passed over the TLS secured signaling channel.
Details of Security
Security of A/V Edge Server Auth Port TCP5062(internal edge only)
When a user logs in to OC or joins a meeting, it first acquires a username/password token from the media relay by sending a SIP SERVICE message over the TLS secured signaling channel. The last leg of this signaling path is a TCP connection from the user’s OCS front end server to the A/V authentication port of the A/V Edge server. This connection is only accepted on the internal facing IP address of the A/V Edge Server. Before accepting the SIP SERVICE request, a TLS connection must be set up where both sides validate the following: 1) Other server provides a certificate signed by a trusted authority, 2) the certificate’s subject name matches the FQDN of that server, and 3) that server’s FQDN matches one of the servers on a local trusted server list. (In fact, all servers in the OCS system perform this series of checks before allowing any communication to or from another OCS server.) If all three checks pass, the TLS connection is established and the SIP SERVICE command carried to the A/V Edge Server, which responds with a 200OK containing the computer generated username/password token.
Security of UDP3478 and TCP443 (internal and external edges)
The A/V Edge Server is an enterprise managed resource, so restricting access to authorized users is important for security and resource considerations. Communication on the UDP3478 and TCP443 ports is only allowed for clients that belong to the corporation managing that A/V Edge Server. A client uses these two ports to allocate UDP and TCP ports within the 50,000 port range for the remote party to connect to. Using the computer generated username/password obtained via the SIP SERVICE request, the client performs digest authentication against the A/V edge server to actually allocate the ports. An initial allocate request is sent from the client and responded with a nonce challenge message from the A/V Edge Server. The client sends a second allocate containing the username and an HMAC hash of the username and nonce. A sequence number mechanism is also in place to prevent replay attacks. The server calculates the expected HMAC based on its own knowledge of the username and password. If the HMAC values match, the allocate procedure is carried out, otherwise the packet is dropped. This same HMAC mechanism is also applied to subsequent messages within this call session. The lifetime of this username/password value is a maximum of 8 hours, at which time the client will reacquire a new username/password for subsequent calls.
Security of UDP/TCP 50,000-59,999 (external edge only)
The question arises, “Are 10,000 ports less secure than a couple well known ports?” One might think so, but actually the answer is no. From an attacker’s standpoint, each of those 10,000 ports behaves exactly the same. The more pertinent question is: “How secure is each of those 10,000 ports?” One consideration is that allocations in this range are chosen randomly. At any given time, it’s likely that many of these ports aren’t even listening for packets. (Contrast that with a well known port that an attacker can focus on.) The security mechanism in place on each port is to filter traffic for only those packets that originate from the remote endpoint’s IP address. This IP address is communicated over the TLS secured signaling channel, and packets from any other IP addresses are dropped by the A/V edge server. In this situation, having a range of ports actually improves security. Since a random port allocation happens for each call, this design forces the attacker to 1) deduce an active port, 2) break the TLS signaling channel, and 3) spoof the remote user’s IP address…all in the span of a single call. Can this port range be reduced? Yes, but doing so limits A/V Edge scale in peak conditions, and does not increase security. A reduced port range should factor no less than 6 UDP/TCP ports per user in a peak load condition. Can this port range be eliminated altogether for companies that don’t require audio/video federation? Unfortunately, this scenario has not been tested and is currently an unsupported configuration.
Security of end to end media
OCS clients perform signaling to the server using 128-bit TLS encryption with validation that the server certificate has a matching FQDN and is signed by trusted authority. This same mechanism is used by e-commerce sites. To secure the media channel, OCS uses the IETF’s SRTP protocol. The mechanism carries out a 128-bit key exchange over the secure signaling channel which the two endpoints then use to encrypt and decrypt the media stream via 128-bit AES. Even if an attacker can perform a “man in the middle” attack of the media path, no eavesdropping or false packet injection is possible.