Archiving high available beyond just a cluster (OCS 2007)

This is a guest post from John Lamb of Modality Systems. John is an MVP and in a recent dialog about architecting a solution that addressed the lack of support for Archive Server clustering John mentioned having completed one. In John’s words:

“We designed a very large EE Pool with 3-tier OCS Archiving and this is how we worked around the lack of support for Archive Server clustering:

We deployed Archiving Servers 1:1 with each OCS FE. 

This is massive overkill in terms of hardware required, but it was the only way to archive the goals for this design.  If we lose an archiving server, it shuts down the corresponding FE.    This means we had to design the pool with N+2  FEs, where N is the number of FE’s required to support the user population.  This way we could lose 1x FE and 1x Archiving Server (which would shut down the corresponding FE) simultaneously and still continue to run at full capacity with archiving.”

Here is the document he shared with me, the customer chose the 3rd option.

OCS 2007 Archiving and CDR Decision Points

John Lamb, OCS Architecture 13-November-2007

The following provides background information and lists the major decision points to consider for the planning of the Archiving and CDR Service for the OCS 2007 deployment.

Background Information

The Office Communications Server (OCS) 2007 product includes functionality that enables Archiving and “Call Data Recording.” This functionality is provided as a software-based service.

· Archiving – provides a mechanism for archiving the content of Instant Messaging conversations only

· Call Data Records – provides a mechanism for capturing usage data for IM and Web Conferencing sessions. Web conference meeting content is not captured.

The Archiving and CDR Service is included in the base license of OCS 2007 Standard Edition/Enterprise Edition and does not create an additional licensing cost. However, additional physical server infrastructure must be deployed to support the scalability and availability requirements for a deployment of this size.

Decision Points

1. IM Archiving and/or Call Detail Records

Determine if the requirement is to capture:

a) Only IM content

b) Only CDRs

c) Both IM content and CDRs

NOTE: CDRs will include call data for both IM and Web Conferencing sessions. It’s not possible to capture CDRs for IM, but not Web Conferencing, or vice versa.

2. IM Archiving: Internal and/or External IM Communication

Determine if the requirement is to

a) Archive only IM conversations for conversations between internal users

b) Archive only IM conversations for conversations between internal users and Federated partners

c) Archive all IM conversations

3. IM Archiving: Global vs. per user policy setting

Determine if IM archiving should be:

a) Enabled globally for all users, and then disabled on a per user basis.

b) Disabled globally for all users, and then enabled on a per user basis

NOTE: If a user is set for “Do not archive” (either through a global or per user policy), that user’s conversations will never be archived, even if they engage in a conversation with a user who is set to “Archive”.

4. Criticality of Archiving and CDR

The Archiving and CDR Service can be configured as “critical”, which means that OCS will shut down if archiving fails.

Determine if:

a) Archiving and CDR service is critical

b) Archiving and CDR service is not critical (OCS will continue to operate even if Archiving and CDR fail. IM archive content and CDRs will be lost)

5. Archiving and CDR Architecture Requirements

Due to the size and scale of the OCS deployment (up to 55k concurrent users) , only 2-tier archiving architectures will be considered.

Option 1: 2-Tier archiving with a single Archiving and CDR Server

This option would include a single point of failure in the Archiving Server tier and would only be suitable if the archiving service is configured to be non-critical.

Note: More analysis is required to determine if a single Archiving Server can meet the scalability requirements.

clip_image002

Option 2: 2-Tier archiving with two Archiving and CDR Servers

This assumes the archiving service is configured as critical.

This option would remove the single point of failure in the Archiving Server tier, but half of the pool (2x FE servers) would be unavailable in the case of a single archiving server failure. Furthermore, if you had an FE failure from “group 1” and an archiving server failure from “group 2”, only 1 FE server would be available, which would be inadequate to support the global user population

clip_image004

Option 3: 2-Tier archiving with dedicated Archiving and CDR Servers

This assumes the archiving service is configured as critical.

This option provides a dedicated archiving server for each front-end server, which improves the overall resilience of the system to the same standard that was available before archiving was included.

clip_image006

Option 4: 2-tier archiving with dedicated Archiving and CDR Servers, and 2 SQL Clusters

This assumes the archiving service is configured as critical.

This option is the same as Option 3, except with an additional SQL Cluster. More analysis is required to determine if 1x or 2x SQL clusters will be needed to meet the scalability/performance requirements of the system.

clip_image008

 

TomL LCSKid