Outbound Routing Fault Tolerance in Lync Server 2010

Enterprise Voice fault tolerance (as well as load balancing) is enabled in several ways in Lync Server 2010. We can start with possibility to deploy Enterprise Edition pool with multiple Front End servers (with either collocated Mediation service or deploying standalone Mediation Server pool with multiple Mediation servers). Apart from achieving fault tolerance through this redundant configuration at a pool level, outbound calls will also be effectively load balanced by configuring DNS load balancing on standalone Mediation Server Pool.

However, in case that site resiliency is required, even configuration with multiple servers in Enterprise Edition or Mediation pools in Central site will not be sufficient to address this requirement. In this case a multiple data center topology has to be deployed. More information about central site voice resiliency as a recommended approach to central site resiliency can be found in official Lync Server 2010 documentation under Planning for Central Site Voice Resiliency.

Apart from these multiple fault tolerance options being available out-of-the-box (but which, nevertheless, require correct planning), Enterprise Voice admins can further enhance voice fault tolerance by configuring Outbound Call Routing components and settings in redundant fashion. Let’s assume that Contoso has deployed multiple data center topology with Enterprise Edition pool and standalone Mediation server pool in each data center:

Figure 1

Outbound call routing configuration steps in Lync Server 2010 differ somewhat compared to OCS 2007 and OCS 2007 R2. While not much has changed in configuring Normalization rules, Voice Policies and PSTN Usages, the configuration of Voice Routes has changed significantly. The biggest change is that now you do not associate Routes with Mediation servers, but rather directly with Gateway peers (which can be IP/PSTN Gateway, SIP Trunk provider Session Border Controller, IP PBX). 

However, prior to configuring Routes, it is necessary to associate Mediation servers with gateway peers (IP/PSTN Gateway, SBC, IP PBX) when defining your topology in Topology Builder.

What impact does this configuration have to your outbound routing?

The answer is pretty straightforward – by associating multiple gateway peers with the same route, you’re adding another layer of fault tolerance to your outbound routing. This is possible since routing now determines which gateway to use for a call and the Mediation Server associated with that gateway will handle the call. Although you could achieve similar configuration in previous OCS versions (by configuring multiple Mediation servers in the same route), the level of granularity in configuring fault tolerance on outbound routing was significantly lower. In his article my colleague Jens touches some significant changes introduced in Lync Server 2010 (no more 1:1 relationship between Mediation server and gateway, as well as load balancing the calls when having multiple gateways in the route).

Let’s see how it works using following examples:

Example A: Sunny day, all components working

 

Figure 2

Let’s suppose that Marino is hosted on EE Pool A in Data center A. He places a call to PSTN number, and the CroatiaNationalCalls route is used to carry the call. Apart from containing PSTN Usage that match those in Marino’s Voice Policy, the CroatiaNationalCalls voice route is associated with both Gateway peer A and Gateway peer B. The red intermittent line shows signaling path, while green line shows media path.

Figure 3

Couple of (shorten) SIP traces will show that Mediation Pool A (Mediation pool medpool.w14lab.com and mediation server w14ms01.w14lab.com) is used to handle the call:

Direction: outgoing
Peer: medpool.w14lab.com:5070
Message-Type: request
Start-Line: INVITE sip:+385351234567@172.31.255.240:5070;user=phone;maddr=medpool.w14lab.com SIP/2.0
From: "Marino Filipovic"<sip:marinof@contoso.com>;tag=fb6c460929;epid=29a361e88d
To: <sip:+385351234567@contoso.com;user=phone>
CSeq: 1 INVITE
Call-ID: f2f17de2c5ed6a1c82e01b0fb1ccf9eb
Record-Route: <sip:pool01.w14lab.com:5061;transport=tls;ms-fe=w14fe01.w14lab.com;opaque=state:T;lr>;tag=F9D9739CC4664BAAA1D425A3E6D2F5D4
Via: SIP/2.0/TLS 172.30.255.102:52633;branch=z9hG4bK78DA00AB.F7405DF5D7C1D3DA;branched=TRUE
Via: SIP/2.0/TLS 172.30.255.171:59990;ms-received-port=59990;ms-received-cid=401E00
ms-application-via: SIP;ms-urc-rs-from;ms-server=w14fe01.w14lab.com;ms-pool=pool01.w14lab.com;ms-application=ad894dc3-55e0-44bf-a07e-3c073aaa4a57
P-Asserted-Identity: "Marino Filipovic"<sip:marinof@contoso.com>,<tel:+38514802002>
ms-application-via: w14be01.w14lab.com_rtc;ms-server=w14fe01.w14lab.com;ms-pool=pool01.w14lab.com;ms-application=51FB453D-5B9F-45df-83B4-ADD1F7E604A8
Contact: <sip:marinof@contoso.com;opaque=user:epid:c0-4HR_d61GaEzW7fhYCgAAA;gruu>
User-Agent: CPE/3.5.6907.0 OCPhone/3.5.6907.0 (Office Communicator Phone 2007 R2)

Direction: incoming
Peer: medpool.w14lab.com:5070
Start-Line: SIP/2.0 183 Session Progress
From: "Marino Filipovic"<sip:marinof@contoso.com>;tag=fb6c460929;epid=29a361e88d
To: <sip:+385351234567@contoso.com;user=phone>;tag=ec42fc28ac;epid=95500EB922
CSeq: 1 INVITE
Call-ID: f2f17de2c5ed6a1c82e01b0fb1ccf9eb
VIA: SIP/2.0/TLS 172.30.255.102:52633;branch=z9hG4bK78DA00AB.F7405DF5D7C1D3DA;branched=TRUE,SIP/2.0/TLS 172.30.255.171:59990;ms-received-port=59990;ms-received-cid=401E00
RECORD-ROUTE: <sip:pool01.w14lab.com:5061;transport=tls;ms-fe=w14fe01.w14lab.com;opaque=state:T;lr>;tag=F9D9739CC4664BAAA1D425A3E6D2F5D4
CONTACT: <sip:w14ms01.w14lab.com@contoso.com;gruu;opaque=srvr:MediationServer:CxCmXfo8Z1K91JOq_lHpMgAA>;isGateway
SERVER: RTCC/4.0.0.0 MediationServer

Glossary:

pool01.w14lab.com = EE Pool A

medpool.w14lab.com = Mediation Pool A

w14ms01.w14lab.com = Mediation Server in Med. Pool A

w14fe01.w14lab.com = FE server in EE Pool A

172.31.255.240 = Gateway peer A

 

Example B: Mediation Pool A and/or Gateway peer A down

Figure 4

 

In case whole Mediation pool A (or Gateway peer A) is down, both signaling and media would take different path, as shown below and recorded in subsequent SIP trace:

Figure 5

Direction: outgoing
Peer: medpool02.w14lab.com:5070
Message-Type: request
Start-Line: INVITE sip:+385351234567@172.31.255.241:5070;user=phone;maddr=medpool02.w14lab.com SIP/2.0
From: "Marino Filipovic"<sip:marinof@contoso.com>;tag=ec7fb59bd1;epid=29a361e88d
To: <sip:+385351234567@contoso.com;user=phone>
CSeq: 1 INVITE
Call-ID: a308fa7249b5dc9c142cd0df87dfe669
Record-Route: <sip:pool01.w14lab.com:5061;transport=tls;ms-fe=w14fe01.w14lab.com;opaque=state:T;lr>;tag=F9D9739CC4664BAAA1D425A3E6D2F5D4
Via: SIP/2.0/TLS 172.30.255.102:52581;branch=z9hG4bK0A7C7979.7D6F128CBD51436D;branched=TRUE
Via: SIP/2.0/TLS 172.30.255.171:59990;ms-received-port=59990;ms-received-cid=401E00
ms-application-via: SIP;ms-urc-rs-from;ms-server=w14fe01.w14lab.com;ms-pool=pool01.w14lab.com;ms-application=ad894dc3-55e0-44bf-a07e-3c073aaa4a57
P-Asserted-Identity: "Marino Filipovic"<sip:marinof@contoso.com>,<tel:+38514802002>
ms-application-via: w14be01.w14lab.com_rtc;ms-server=w14fe01.w14lab.com;ms-pool=pool01.w14lab.com;ms-application=51FB453D-5B9F-45df-83B4-ADD1F7E604A8
Contact: <sip:marinof@contoso.com;opaque=user:epid:c0-4HR_d61GaEzW7fhYCgAAA;gruu>
User-Agent: CPE/3.5.6907.0 OCPhone/3.5.6907.0 (Office Communicator Phone 2007 R2)

Direction: incoming
Peer: medpool02.w14lab.com:5070
Message-Type: response
Start-Line: SIP/2.0 183 Session Progress
From: "Marino Filipovic"<sip:marinof@contoso.com>;tag=ec7fb59bd1;epid=29a361e88d
To: <sip:+385351234567@contoso.com;user=phone>;tag=76451dc3ba;epid=2756E7038A
CSeq: 1 INVITE
Call-ID: a308fa7249b5dc9c142cd0df87dfe669
VIA: SIP/2.0/TLS 172.30.255.102:52581;branch=z9hG4bK0A7C7979.7D6F128CBD51436D;branched=TRUE,SIP/2.0/TLS 172.30.255.171:59990;ms-received-port=59990;ms-received-cid=401E00
RECORD-ROUTE: <sip:pool01.w14lab.com:5061;transport=tls;ms-fe=w14fe01.w14lab.com;opaque=state:T;lr>;tag=F9D9739CC4664BAAA1D425A3E6D2F5D4
CONTACT: <sip:w14ms11.w14lab.com@contoso.com;gruu;opaque=srvr:MediationServer:28bAlNPmzFOn7fSHzYQ4VwAA>;isGateway
SERVER: RTCC/4.0.0.0 MediationServer

Glossary

pool01.w14lab.com = EE Pool A

medpool02.w14lab.com = Mediation Pool B

w14ms11.w14lab.com = Mediation Server in Med. Pool B

w14fe01.w14lab.com = FE server in EE Pool A

172.31.255.241 = Gateway peer B

 

Prerequisites

- Correct configuration of Outbound Routing components

1. Associate Mediation Pool A with Gateway peer A and Mediation Pool B with Gateway peer B

 

2. Create Pool-level trunk with following options

- Enable REFER method support (if REFER is supported on gateway peer side)

- Enable Centralized Media Processing if media termination has the same IP as the signaling termination

- Configure appropriate Media Bypass option (media bypass was not used here for simplicity)

- Create translation rules if necessary

 

3. Create and configure relevant Voice Routes

- Add appropriate PSTN Usages to Voice Routes

- Associate Routes with relevant gateway peers (in our example, CroatiaNationalRoute is associated with both Gateway peer A and Gateway peer B)

 

- Redundant network link exists between Datacenter A and Datacenter B

 

Summary

Outbound Routing in Lync Server 2010 provides a lot of granularity in configuring fault tolerant routes. Apart from two mentioned options, there’s another one I haven’t discussed in details above – the scenario when e.g. both Mediation pool in one location is down and at the same time, Gateway peer in other location is down:

Figure 6

If you also associate Mediation Pool A with Gateway peer B and Mediation Pool B with Gateway peer A (in addition to the steps mentioned in Prerequisites), you will enable the call to go through even in above scenario:

Figure 7

Although the topology presented in these examples describes rather complex scenarios with multiple data centers, the similar fault tolerance level can be achieved with much simpler topologies (e.g. single data center, two Standard Edition servers associated with two gateway peers – correct configuration of trunks and routes would enable you the similar level of fault tolerance as described in this article, apart from site resiliency, obviously).

It is worth mentioning that this is not the only way of configuring fault tolerance on outbound routing (e.g. – failover route is another alternatives in some scenarios) and its up to administrators to pick the configuration that best suites their particular requirements.