New in Lync 2013 High Avaliablity


Limitations of Lync server 2010  

– Lync Server 2010, we had to write all changes to the user state, user contacts, meetings and so on to the Back-End SQL Server. This was causing high IO and we had to limit the number of Front-End Servers per Lync Pool
– The Metropolitan Site Resiliency solution was complex, difficult to implement and had strong requirements regarding the network between the sites (RTT < 20ms, Bandwidth > 1Gbps, Layer 2 broadcast domain for the Pool subnet). We required an external solution for synchronous data replication between the sites
– There was no failover between Lync 2010 Pools and customers had to force move users in case of a disaster with one Pool
– Voice resiliency was available with SBA or SBS, but presence and IM would still fail
 
New in Lync Server 2013 High Availability
– Remove SQL Back-End bottleneck
– improved DR capabilities
– Reduce infrastructure requirements
this was introduced by the Brick model
Brick Model is a synonym for having all services and data available on the Front-End. We make sure that the data of one FE will be copied to 2 additional FEs synchronously to not lose data in case of a FE loss. The Back-End SQL Server
 
Brick Model
The Brick Model helped removing the limit of 10 FEs per Pool and the bottleneck with the SQL Back-End. With supporting SQL Mirroring, we improved DB availability with less expensive
solutions
In Lync Server 2013, Front End pool architecture has changed, and these changes affect how you should plan and maintain your Front End pools. We recommend that all your Enterprise Edition Front End pools
include at least three Front End Servers. In Lync Server, the architecture of Front End pools uses a distributed systems model, with each user’s data kept on three Front End servers in the pool
Lync 2013 Pool Planning
– With the new design, the Back End database is no longer the real-time data store in a pool. Information about a particular user is kept on three Front End Servers in the pool. For each user, one Front End Server acts as the master for that user’s information, and two other Front End Servers serve as replicas. If a Front End Server goes down, another Front End Server which served as a replica is automatically promoted to master
– This happens behind the scenes, and administrators do  not need to know which Front End Servers are the masters for which users. This distribution of data storage improves performance and scalability within the pool, and eliminates the single point of failure of a single Back End Server
– The Back End Server serves as backup storage for user and conference data, and is also the primary storage for other databases such as the Response Group database
– If you do not want to deploy three Enterprise Edition Front End Servers and want high availability and disaster recovery, we recommend you use Lync Server Standard Edition and create two pools with a – paired backup relationship. This will provide the best high availability and disaster recovery solution with only two servers
– if you are going to deploy 2-node pools follow this guidance
  – If one of the two Front End Servers goes down, you should try to bring the failed server back up as soon as you can
  – If you need to upgrade one of the two servers, bring it back online as soon as the upgrade is finished
  – If you need to bring both servers down at the same time, do the following when restarting the servers:
       A- Restart both Front End Servers at the same time
       B- If the two servers cannot be restarted at the same time, you should bring them back up in the reverse order of the order they went down
       C- If you cannot bring them back up in that order, then use the following cmdlet before bringing the pool back up:
            – Reset-CsPoolRegistrarState -PoolFQDN {PoolFQDN} -ResetType QuorumLossRecovery 
              A quorum loss recovery is reset is typically used when the number of active Front End servers in a pool falls below the quorum state (that is, when fewer than 85% of the Front End servers in a pool are currently active).
             – Reset-CsPoolRegistrarState -PoolFQDN {PoolFQDN} -ResetType fullreset
              The FullReset option is typically used after topology changes or it unrecoverable errors occur when a pool is started  
Windows Fabric   
– Determine where users are homed (group assignment)
– Returns user location for routing
– Define voter note and Quorum
– Synchronous Data replication
Synchronous Replication: When a user changes his persistent data, it is committed to the user’s Backup Server (up to 2 Backups). For example, if a user’s Primary Server is S1, and Backup Server is S3, S1 does not return success response until the data is committed to S3
Synchronous Replication Features
– Persistent User  Data
Synchronous replication to two more FEs
Client sees success only if data is written to replicas correctly
– Lazy replication to shared Blob Store
– Ensures that data is available immediately for failovers

 

Persistent User  Data
  X- Synchronous replication to two more FEs(Backup / Replicas
  X- Presence, Contacts/Groups, User Voice Setting, Conferences
  X- Lazy replication used to commit data to Shared Blob Store (SQL Backend)
  X- Deep Chunking  is used to reduce Replication Deltas
– Transient User Data
  X- Not replicated across Front End servers
  X- Presence changes due to user activity, including:
      – Calendar
      – phone call
      – Inactivity
    

Comments (5)

  1. Thanks for such a wonderful blog

  2. mohamed hassan says:

    amazing blog !!

  3. Eduardo says:

    Hi thanks for this, one question, how long does it take for a new master to take over if the previous master fails?
    I’ve been making some tests and it takes over 2 minutes for this to kick in and the users can chat again, am I missing something or is this expected? Thanks

  4. Yassine Zeroual says:

    Thank you so much for this informative blog.
    Using Lync Server 2013 and SQL mirroring for the database Back End as well as Distributed File System (DFS) for the file share provides a multi-data-center high availability approach.

  5. Thanks Tamer for this extremely useful article.