SharePoint 2010 Architecture and Disaster Recovery – when to use UPRE and when to log ship

The SharePoint 2010 Database types and descriptions TechNet article was recently updated (March 5th, 2013) to allow log shipping the Profile and Social databases for the User Profile Service. This change aligns SharePoint 2010 to match the similar TechNet article for SharePoint 2013. To understand the change, it's important to understand the tight correlation between Managed Metadata Service (MMS), User Profile Service (UPS), Content Databases and how User Profile Replication Engine (UPRE) works. Our goal architecture will be a Hot Standby Farm as discussed here and here. In the Hot Standby environment, we're log shipping the Content and MMS databases with matching timings (Backup, Copy and Restore Agent Jobs all happen at the same time)

 

Obligatory warning…here comes the deep dive!

There are several fields in users' profile that leverage the MMS service. Out of box, these include the Past Projects, Skills and Schools fields:

 

These are free text fields and are used for a better search and social experience. Leveraging UPS and MMS, users can identify all users with a particular skill set, or past alumni. You can see all the terms that have been created by going to the Term Store Management:

 

In the database, we can see the terms association across the MMS and UPS Profile databases:

Note: Do not run these on your production environment; they are only here to show the closely related the IDs and Terms across the databases. Even with NOLOCK, it can affect the farm and put you in an unsupported state. I crashed my MMS Service while getting these screenshots because of these queries

 

We can see the same tight association when we use an MMS column in a list in the content databases:


 

And in the database:

Note: Do not run these on your production environment; they are only here to show the closely related the IDs and Terms across the databases. Even with NOLOCK, it can affect the farm and put you in an unsupported state. I crashed my MMS Service while getting these screenshots because of these queries

 

The Managed Metadata Service application uses the GUID and TermID fields to provide all MMS functionality. The column value is added for the end user to view without the need of another join on the list item. Keeping the Managed Metadata Service's Database, the Content Database and the User Profile's Profile Database in sync is required for proper functionality of the environment.

If databases get out of sync, you'll have mixed results when using any MMS driven fields. Some of the symptoms you'll see are: the MMS fields are read/only (it's relying on the column value to populate for the end user instead of using the ID), all MMS driven functionality will cease (workflows, content type logic, etc.). Search will start acting up when crawling users, social tagging/notes will fail intermittently and other issues will manifest.

The real problem is that you won't know this is happening until you try and use the secondary farm! Think about that for a moment: your primary farm is offline, the secondary farm is brought online and large components of the farms are inoperable. THAT will not be a fun day.

<< Obligatory warning over >>

 

The UPRE Gap: (Spoiler – UPRE is unreliable if the MMS Database is Read/Only)

The key to a viable standby farm requires that the Profile Database, the Managed Metadata Database and the Content database are in sync. Until recently, the only way to accomplish a standby farm was to log ship the Content and MMS databases, but use UPRE for the Profiles. This kept the Content and MMS databases in sync, but Profiles (UPS) was in a constant state of flux. Consider these scenarios:

  1. Scenario: MMS log is applied during UPRE run
    1. Outcome: all updates for that run would fail as UPRE would not be able to access the MMS Database.
  2. Scenario: a user adds a new MMS term to their profile and UPRE ran before the MMS log ship
    1. UPRE will try to add the MMS term at the destination farm and fail due to the database being in Read-Only (because of log shipping)

Traditionally, the answer has included some kind of timing solution and ignoring the UPRE errors that occur in the second scenario. But this isn't a true hot standby; a large portion of your farm (profiles) are in a state of flux, you can't be sure of their current state. If the hot standby has an RTO/RPO of less than an hour, but you've implemented a timing solution (for example, UPRE run every 24 hours and MMS log updates are paused) then you've already missed your RTO/RPO SLAs by your own design!

 

How to choose when to UPRE and when to Log Ship:

The decision to UPRE or to Log Ship the Profile and Social databases is influenced by several factors: Volume of Social Data, RPO/RTO, Disaster Recovery you're leveraging (Hot/Warm/Cold). To guide the decision, we have to ask several questions:

  1. How are we using the secondary farm?
    1. Is it a Hot Standby? Or another active farm that needs Profile information?
  2. (Assuming it's a Hot Standby) What is the RPO and RTO times for the secondary farm?
  3. (Assuming it's a Hot Standby) Is there a difference in timing between UPRE and Log Shipping?
    1. Can both UPRE and Log Shipping complete within the RPO and RTO window?
  4. How critical are User Profiles? Are they deemed critical enough to have the same uptime and availability as the content databases?
  5. What is the state of the secondary farm's MMS Database (read-write or read-only)?

Answering these questions will quickly favor UPRE or Log Shipping. Consider these two scenarios:

Scenario A:

Consider you have 2 regional farms deployed, one in London and one in Seattle. Both farms are serving content to users, but Profiles from Seattle Farm are needed at the London Farm. The answers to our questions are as follows:

  1. How are we using the secondary farm? Another active Farm that needs Profile information
  2. (Assuming it's a Hot Standby) What is the RPO and RTO times for the secondary farm? N/A – it's not a Hot Standby
  3. (Assuming it's a Hot Standby) Is there a difference in timing between UPRE and Log Shipping? N/A – it's not a Hot Standby
    1. Can both UPRE and Log Shipping complete within the RPO and RTO window? N/A – it's not a Hot Standby
  4. How critical are User Profiles? Are they deemed critical enough to have the same uptime and availability as the content databases? No
  5. What is the state of the secondary farm's MMS Database (read-write or read-only)? Read-Write

Since the London Farm's MMS database is Read/Write and is an active farm requiring profile information from Seattle Farm, UPRE would be our choice. UPRE would be able to add terms to the MMS Database and would not be susceptible to disconnection of a log restore of the MMS database.

Note: this is NOT a DR/Hot-Standby solution. This is merely two farms that hold separate information, but the London Farm needs the Seattle Farm's profiles for a business need.

 

 

Scenario B:

Consider the same 2 regional farms deployment, a farm in London and a farm in Seattle. All users access the Seattle Farm; the London farm is used if the Seattle farm is unavailable due to maintenance or downtime. The answers to our questions are as follows:

  1. How are we using the secondary farm? Hot-Standby
  2. (Assuming it's a Hot Standby) What is the RPO and RTO times for the secondary farm? 45 minutes and 30 minutes
  3. (Assuming it's a Hot Standby) Is there a difference in timing between UPRE and Log Shipping? No difference in timing between UPRE and Log Shipping
    1. Can both UPRE and Log Shipping complete within the RPO and RTO window? Yes
  4. How critical are User Profiles? Are they deemed critical enough to have the same uptime and availability as the content databases? Yes
  5. What is the state of the secondary farm's MMS Database (read-write or read-only)? Read-Only

Since the London Farm's MMS database is Read/Only and is Hot-Standby of the Seattle Farm, Log Shipping the Profile and Social databases would be our choice. Log Shipping allows us to have a consistent point-in-time on the secondary farm without being affected by the database disconnects that would plague UPRE in this same scenario.

Note: this is a DR/Hot-Standby solution. Seattle Farm is log shipping all the content, MMS and UPS databases to the London farm.

Note: it's critical that your databases (Content, MMS, Profile and Social) are all in sync in this scenario. Make sure the timing on your SQL Agent Jobs all match!

 

Is there ever a time to use UPRE instead of Log Shipping in a Hot Standby?

Absolutely! Here are a few scenarios where UPRE could be chosen instead of Log Shipping:

  • UPRE gives us the flexibility to pick and choose specific profile fields to synchronize and Log Shipping is all or nothing. If there was a business or security need to reduce the properties to a smaller subset, then UPRE would be a better decision.
  • If we are space constrained on the secondary farm, UPRE would be a better decision; log shipping requires more space on the secondary farm to hold the backup of the transaction log.
  • If the 3 SQL Agent Jobs associated with Log Shipping exceed the RTO/RPO SLAs, UPRE would be the better decision
    • This can happen if you are heavily using SharePoint's social features.

 

Summary:

With the option to Log Ship the Profile and Social databases, we have greater flexibility to meet the business' needs to provide a reliable and consistent user experience. Choosing when to Log Ship and when to UPRE, as well as the pros and cons of each, is critical to ensure your architecture is working as expected.