Another Reason to Include a Director in Your Lync Server 2010 Deployment

Update 1/1/11 - Updated post with more information on what happens with an Enterprise Edition pool and how EndpointConfiguration.cache adds a wrinkle to the whole process.

In Lync Server 2010 the Director is now a dedicated role.  Because there's an actual Director role now you don't have to worry about users accidentally getting homed on the Director, like you did in OCS 2007 R2.  But the real reason to take another look at the Director role in Lync Server 2010 has to do with the new resiliency option.  Specifically the ability to associate a backup registrar pool.  The Director plays an important role in informing the clients what their primary and backup registrars are.

So the natural question that comes up is "Is a Director required for the backup registrar ability to work?".  The answer to that is technically no, it isn't required.  The Director isn't providing any special functionality.  Just like in OCS 2007 R2, the Front End Servers have the same Registrar service/functionality as the Director.  If a user registers against a registrar that isn't their primary registrar, a 301 Redirect will be returned that includes the user's primary and backup registrar.  A Front End Server or a Director both have the ability to do this.  There is another option described below, but it has some limitations of it's own.  Using a Director is the recommended and simplest way to take advantage of this functionality.

The trace below shows the response from the Director to the client during client sign in: 

12/20/2010|11:56:24.001 728:A3C INFO :: Data Received - 172.16.8.8:5061 (To Local Address: 172.16.8.7:50327) 756 bytes:
12/20/2010|11:56:24.001 728:A3C INFO :: SIP/2.0 301 Redirect request to Home Server
Authentication-Info: TLS-DSK qop="auth", opaque="E44B1AA4", srand="50E900F0", snum="1", rspauth="23003a3babd1b5a8b3b02c34372b07e66a8b0bfa", targetname="BETA-LS14-DIR.beta.deitterick.com", realm="SIP Communications Service", version=4
From: <sip:acooper@beta.deitterick.com>;tag=4787346104;epid=c68d71323c
To: <sip:acooper@beta.deitterick.com>;tag=FBF2A5131E02B962310A8078B52D77C3
Call-ID: a59c3223b23b4d398d4c77b95769e1b1
CSeq: 4 REGISTER
Via: SIP/2.0/TLS 172.16.8.7:50327;ms-received-port=50327;ms-received-cid=100
Contact: <sip:beta-ls14-se1.beta.deitterick.com:5061;transport=TLS>;q=0.7
Contact: <sip:beta-ls14-se2.beta.deitterick.com:5061;transport=TLS>;q=0.3
Expires: 2592000
Content-Length: 0

12/20/2010|11:56:24.001 728:A3C INFO :: End of Data Received - 172.16.8.8:5061 (To Local Address: 172.16.8.7:50327) 756 bytes

The two important lines from the redirect are these:

Contact: <sip:beta-ls14-se1.beta.deitterick.com:5061;transport=TLS>;q=0.7
Contact: <sip:beta-ls14-se2.beta.deitterick.com:5061;transport=TLS>;q=0.3

The Director passes back to the client both the user's primary and backup registrar.  The q= value tells you if the server is the primary or secondary.  In this case, beta-ls14-se1.beta.deitterick.com is the primary, and beta-ls14-se2.beta.deitterick.com is the backup.  The q=0.7 specifies the primary registrar and the q=0.3 specifies the backup registrar.

If we take the Director out of the environment and set the SRV record to point to beta-ls14-se1.beta.deitterick.com any user that is homed on that pool won't be notified of their backup registrar.  Looking at the trace below from a user homed on that server you can see that there isn't a 301 Redirect returned to the client:

This means that the client doesn't know about it's backup registrar.  If the server/pool that the user is homed on is unavailable the user will be unable to register with the backup registrar.

There are two ways to resolve this issue.  The first is to include a Director in your design.  The SRV record for automatic client log on will point to the Director and users will be returned their primary and backup registrars.  The second option is to use multiple SRV records with different priorities.  Using this method you can specify both the primary and backup registrar in DNS for automatic client log on so that in the event that the primary registrar is unavailable the client has a way to contact the backup registrar.

The only issue with the second option is that it doesn't scale very well.  For a small environment with two pools, this would work fine, but for a large environment with multiple pools in multiple data centers this may not be the most efficient option and a Director might make more sense.

Enterprise Edition Pool

An Enterprise Edition pool behaves pretty much the same as a Standard Edition Server, with one difference, there are multiple registrars in the pool that the user can register with.  In Lync Server 2010 a user homed on an Enterprise Edition pool has one Front End Server defined as their primary registrar.  In Lync Server 2010 the Registrar service has been split out into it's own component with no shared registrar database anymore.  Because of this all of the endpoints for a user need to register against the same registrar server in the user's home pool.  What that all means is that depending on the Front End Server you try to register with, you may or may not get a 301 Redirect to your home server returned to you.  If you try to register with the registrar that is defined as your primary registrar, you won't be redirected.  If you try to register with any of the other Front End Servers in the pool, you'll get the 301 Redirect.  And as I mentioned above, the 301 Redirect is where the client is informed of the user's backup registrar.

So another question that comes up is "How do I figure out with Front End Server in the pool that is defined as a user's primary registrar?".  The answer is pretty simple...PowerShell.

You can use the PowerShell cmdlet Get-CsUserPoolInfo to display a user's primary and backup registrars as well as the order of Front End Servers that the user will register against.

Get-CsUserPoolInfo -Identity lcarter@beta.deitterick.com

PrimaryPoolFqdn                     : ls14pool.beta.deitterick.com
BackupPoolFqdn                      : beta-ls14-se2.beta.deitterick.com
UserServicesPoolFqdn                : ls14pool.beta.deitterick.com
PrimaryPoolMachinesInPreferredOrder : {1:5-2, 1:5-1}
BackupPoolMachinesInPreferredOrder  : {1:3-1}

As you can see, the user's primary pool is ls14pool.beta.deitterick.com and their backup pool is beta-ls14-se2.beta.deitterick.com.  For their primary pool they will registrar against server 1:5-2 first, then 1:5-1.  That's great, but how do you match those identifiers up with the actual Front End Server FQDNs?  The answer to that is more PowerShell, of course!

If you pipe the command above to the Select-Object cmdlet, you can expand the properties of PrimaryPoolMachinesInPreferredOrder.  I also piped all of that to a Format-List to pull out the important pieces from the output:

Get-CsUserPoolInfo -Identity lcarter@beta.deitterick.com | Select-Object -ExpandProperty PrimaryPoolMachinesInPreferredOrder | Format-List MachineId,Fqdn

MachineId : 1:5-2
Fqdn      : beta-ls14-ee2.beta.deitterick.com

MachineId : 1:5-1
Fqdn      : beta-ls14-ee1.beta.deitterick.com

So from this you can see that beta-ls14-ee2.beta.deitterick.com is the user's primary registrar.

Now that we know which Front End Server in the pool the user will be redirected to we know that if they try to register with beta-ls14-ee2.beta.deitterick.com, no 301 Redirect will be returned.  If they try to register with beta-ls14-ee1.beta.deitterick.com, a 301 Redirect will be returned and it will include the backup registrar.

In this trace, the user contacts their primary registrar (172.16.8.10), so no 301 Redirect is returned:

12/31/2010|13:18:17.346 630:358 INFO :: Sending Packet - 172.16.8.10:5061 (From Local Address: 172.16.8.7:50430) 795 bytes:
12/31/2010|13:18:17.346 630:358 INFO :: REGISTER sip:beta.deitterick.com SIP/2.0
Via
: SIP/2.0/TLS 172.16.8.7:50430
Max-Forwards: 70
From: <sip:lcarter@beta.deitterick.com>;tag=e5fc295f13;epid=159a38c2a2
To: <sip:lcarter@beta.deitterick.com>
Call-ID: 50490a0fd7534119ad400d5ca46e20aa
CSeq: 1 REGISTER
Contact: <sip:172.16.8.7:50430;transport=tls;ms-opaque=d4b2948966>;methods="INVITE, MESSAGE, INFO, OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER, BENOTIFY";proxy=replace;+sip.instance="<urn:uuid:C816ACB8-5459-5B19-ADB3-2A9F0A6974A7>"
User-Agent: UCCAPI/4.0.7577.0 OC/4.0.7577.0 (Microsoft Lync 2010)
Supported: gruu-10, adhoclist, msrtc-event-categories
Supported: ms-forking
Supported: ms-cluster-failover
Supported: ms-userservices-state-notification
ms-keep-alive: UAC;hop-hop=yes
Event: registration
Content-Length: 0

12/31/2010|13:18:17.346 630:358 INFO :: End of Sending Packet - 172.16.8.10:5061 (From Local Address: 172.16.8.7:50430) 795 bytes

In this trace the user contacts the other server in the pool (172.16.8.9) and a 301 Redirect is returned.  You can see in the response that it also includes the primary and backup registrars:

01/01/2011|13:09:59.182 B34:A44 INFO :: Sending Packet - 172.16.8.9:5061 (From Local Address: 172.16.8.7:50779) 795 bytes:
01/01/2011|13:09:59.182 B34:A44 INFO :: REGISTER sip:beta.deitterick.com SIP/2.0
Via: SIP/2.0/TLS 172.16.8.7:50779
Max-Forwards: 70
From: <sip:lcarter@beta.deitterick.com>;tag=e820708c06;epid=159a38c2a2
To: <sip:lcarter@beta.deitterick.com>
Call-ID: 5de48871972847a0b16669398687d8dd
CSeq: 1 REGISTER
Contact: <sip:172.16.8.7:50779;transport=tls;ms-opaque=edd211a910>;methods="INVITE, MESSAGE, INFO, OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER, BENOTIFY";proxy=replace;+sip.instance="<urn:uuid:C816ACB8-5459-5B19-ADB3-2A9F0A6974A7>"
User-Agent: UCCAPI/4.0.7577.0 OC/4.0.7577.0 (Microsoft Lync 2010)
Supported: gruu-10, adhoclist, msrtc-event-categories
Supported: ms-forking
Supported: ms-cluster-failover
Supported: ms-userservices-state-notification
ms-keep-alive: UAC;hop-hop=yes
Event: registration
Content-Length: 0

01/01/2011|13:09:59.197 B34:A44 INFO :: End of Sending Packet - 172.16.8.9:5061 (From Local Address: 172.16.8.7:50779) 795 bytes

 

01/01/2011|13:09:59.260 B34:A44 INFO :: Data Received - 172.16.8.9:5061 (To Local Address: 172.16.8.7:50779) 758 bytes:
01/01/2011|13:09:59.260 B34:A44 INFO :: SIP/2.0 301 Redirect request to Home Server
Authentication-Info
: TLS-DSK qop="auth", opaque="212BCC58", srand="A46A87C1", snum="1", rspauth="9a517940f6f912c5fd30a4ab48f95c4bb256adf7", targetname="BETA-LS14-EE1.beta.deitterick.com", realm="SIP Communications Service", version=4
From: <sip:lcarter@beta.deitterick.com>;tag=e820708c06;epid=159a38c2a2
To: <sip:lcarter@beta.deitterick.com>;tag=9E9546E818951C08C2E9CA2E852F3D2E
Call-ID: 5de48871972847a0b16669398687d8dd
CSeq: 4 REGISTER
Via: SIP/2.0/TLS 172.16.8.7:50779;ms-received-port=50779;ms-received-cid=9B000
Contact: <sip:beta-ls14-ee2.beta.deitterick.com:5061;transport=TLS>;q=0.7
Contact: <sip:beta-ls14-se2.beta.deitterick.com:5061;transport=TLS>;q=0.3
Expires: 2592000
Content-Length: 0

01/01/2011|13:09:59.260 B34:A44 INFO :: End of Data Received - 172.16.8.9:5061 (To Local Address: 172.16.8.7:50779) 758 bytes

 

At this point the client knows about the user's backup registrar.  If the client lost connectivity to the entire pool/data center the client would be able to contact the backup registrar.

So now for the wrinkle...EndpointConfiguration.cache.  Upon the first successful logon, the client writes out the user's primary registrar to the EndpointConfiguration.cache file.  All subsequent logons will use this file to determine which server to send the initial register to.  That means that even if you have a Director configured in the environment, it won't be used after the first logon.  This also means that the client will connect directly to it's primary registrar, so no 301 Redirect will be returned to the client.  Also, the backup registrar is not cached on the client.  It needs to be provided to the client at each logon.

What happens if the user's primary registrar is/goes down?  In this case the client can't use the EndpointConfiguration.cache file, so it will fall back to automatic or manual configuration, depending on which you have configured in your environment.  At that point the client would connect either to a Director and be redirected to the next Front End Server in the user's PrimaryPoolMachinesInPreferredOrder list or to the pool and possibly be redirected to the next Front End Server in the user's PrimaryPoolMachinesInPreferredOrder list.  I say possibly because there's no way of knowing which Front End Server the client will try to connect to.  In my lab environment I'm using DNS load balancing.  The IP addresses for both Front End Servers will be returned to the client and then the client will pick one and attempt to connect.  It's possible that the IP that the client picks is the user's registrar.  Without a Director, there is no way to guarantee that the client will always get the user's backup registrar returned to it.

 

So...what does this all mean?  When you are planning your Lync Server 2010 deployment, if you are planning on defining a backup registrar for your users, you need to make sure that you understand how the backup registrar will be returned to the clients and make sure that you can guarantee that all clients will be able to connect to their backup registrar, either by using a Director, multiple SRV records, or both.  By doing this you can make sure that in the event of a fail over, you can achieve the resiliency that you are looking for.