SharePoint Server 2016/2019 Zero-Downtime Patching demystified

Disclaimer: this article only covers single aspects of the upgrade process. For a complete list of upgrade approaches including all details please check the following articles:

Patching SharePoint servers has always been a challenge. A major culprit for administrators is the fact that installing SharePoint fixes will cause downtime to the SharePoint farm. For SharePoint Server 2016 it was announced that we will have zero-downtime patching. This article intends to outline the reasons for downtimes with previous versions of SharePoint and the changes implemented in SharePoint Server 2016 to allow to perform zero-downtime patching including its constraints.

Historically this downtime can occur in two different steps:

  1. Installing the binaries
  2. Running the SharePoint configuration wizard.

Installing the binaries

While installing the actual binaries it is required to stop/restart SharePoint specific windows services and IIS websites to ensure that the updated files (e.g. dlls) are loaded.

The server specific downtime usually starts with running the hotfix executable till the hotfix installation is done.
With the huge hotfix files we are used to from SP2013 this can take several hours. Most of this time accounts for the extraction of the exe and cab file and then applying the various installer msp files which are included in the hotfix package.

Russ Maxwell has created a script which helps to significantly reduce the amount of time required to install the actual fixes but you will still experience a downtime on the specific server during the installation of the fix.

To overcome this issue we recommended to have high availability per role in your farm (at least two servers per role) to ensure that you can patch one server while the other one is online and continues to serve the requests. That means during patching you can remove one server from load balancing, patch it and add it back to load balancing. Then repeat the same steps for the other server(s).

While the patches are applied using this steps your farm will have servers with two different patch levels but that is fine as SharePoint will run in backward compatible mode if the database schema is older than the patch level of the servers in the farm.

So using redundant servers, you can achieve zero-downtime during the hotfix installation even with SharePoint 2013.

After the patch has been installed on all server you need to run the SharePoint configuration wizard.

Running the SharePoint configuration wizard

The SharePoint configuration wizard needs to be run on all machines in the farm to complete the hotfix installation.

You can use either the command line version (psconfig.exe) or the UI version (psconfigui.exe). I have discussed benefits and caveats of using the different methods in a separate article.

It is recommended to first run the configuration wizard on one of the app servers in the farm to perform the database upgrade. After this step is completed the configuration wizard can be started on all the other servers in the farm to perform the server specific upgrade operations (copy dlls, install features, apply security settings, …) on each of these servers (again you should remove the servers where psconfig is running from load balancing as services are restarted).

With SharePoint 2013 and before the database upgrade step will cause downtime on all servers in the farm as psconfig is updating databases which are used by all servers in the SharePoint farm. Depending on the specific updates applied to the database stored procedures, views, triggers and constraints could be dropped and recreated and other database content could be updated. SQL Queries issued by the SharePoint servers during this upgrade can fail if (e.g.) a stored procedure was called while the upgrade job was dropping it to replace it with an updated version or a fix contains (e.g.) two different changes to two different stored procedures where the changes depend on each other but only one stored proc was updated when the request comes in.

These limitations could also cause failed upgrades or excessive slowdowns in the upgrade process because of resource contention and locking. For these reasons accessing SharePoint content while the database is upgraded is unsupported and untested.

A partial workaround for this exists in SharePoint 2013 for updating the content database: After installing the patch and before running the SharePoint configuration wizard customers can run

Upgrade-SPContentDatabase -UseSnapshot …

Upgrade-SPContentDatabase can perform the same upgrade for a content database as the SharePoint configuration wizard. The benefit of using this Cmdlet is that you can run it in parallel in several powershell windows against different content databases. That can help to significantly reduce the database upgrade time after installing a hotfix as it allows to upgrade several content databases in parallel. The SharePoint configuration wizard would upgrade the content databases sequentially one after the other.
Another benefit is the -UseSnapshot parameter listed above.

This parameter will create a snapshot of the specified database and then perform all upgrade operations that apply to the database. Existing connections to the content database from the different SharePoint servers in the farm will be set to use the snapshot for the duration of the upgrade, and then switched back after successful completion of upgrade. Be aware that this parameter can only be used with versions of SQL Server that support creation and use of snapshots, for example, SQL Server Enterprise edition.

As all regular SharePoint operations are executed against the snapshot while the content database is upgraded the problems listed above will not occur. The caveat here is that the snapshot is read-only. That means although it prevents the above listed problems related to dropping stored procedures while they are in use they will prevent all write operations to the SQL database.

After upgrading the SharePoint content databases you still have to run the SharePoint configuration wizard to upgrade all the other SharePoint databases – and this can also cause downtime to services accessing these databases during the upgrade.

So it is not possible to achieve zero-downtime for read-write operations during the hotfix installation with SharePoint 2013.

Changes in SharePoint Server 2016

With SharePoint Server 2016 the amount of MSP files has been significantly reduced. This will ensure that the time to patch the servers will be much shorter. Aside of that you still have to ensure that you have more than one server per role (high availability) to guarantee zero-downtime patching as the windows services have to be restarted during patching. MinRole is not required for this! All the details on how to technically apply the patches can be found in the following Technet article:

The biggest improvements over SP2013 are on the upgrade side. Several improvements have been made to ensure that upgrading the SharePoint databases does not lead to a downtime for the end user. These changes include changes on how the changes are applied but also restricting what changes are allowed to be done in a hotfix request. E.g. all stored procedures have to be backward compatible to ensure that if one stored procedure is updated with a hotfix it can still be called from older stored procedures which are updated through a later step in the update cycle. We also update the stored procedures without dropping them to prevent outages if a stored proc is called while it is in the limbo state between drop and recreation.

These changes are long tested in SharePoint online where upgrades are performed for all SharePoint online servers every couple of weeks while the service is live without any read-only windows for customers.

These improvements apply as well to content databases and to other SharePoint databases that need to be upgraded. You can still use Upgrade-SPContentDatabase to speed up the database upgrade step but using the -SnapShot parameter will not bring a benefit when trying to minimize the downtime. It can actually be counter-productive as it leads to a read-only content database.

With the improvements implemented and using redundant servers for each role it is possible to achieve zero-downtime during the hotfix installation with SharePoint Server 2016.

35 Comments


  1. Hi Stefan,
    Super! Finally this feature has been clearly explained. Many thanks for that.
    Cheers,
    Marco

    Reply

  2. Thank you for the great insights!
    But do you still have to do all steps listed in the TechNet article “Install a software update for SharePoint Server 2016” for servers with search components?
    It was updated end of April but contains “Osearch15” instead of “Osearch16″…
    I would have guessed with all the improvements in SharePoint 2016 there would also be some improvements regarding search?
    Thank you
    Andy

    Reply

    1. That is still accurate. The version number is most likely a copy&paste error.

      Reply

  3. Oh and a second question:
    The TechNet article “Software updates overview for SharePoint Server 2016” is from February and list 2 types of “Software update strategies”:
    1. Install the update and do not postpone the upgrade phase.
    2. Install the update and postpone the upgrade phase.
    (see: https://technet.microsoft.com/en-us/library/ff806329%28v=office.16%29.aspx#updatestrategy)
    I mean if you have redundant servers why don’t you just upgrade the first app server and then each front-end (while taking out of load balancing)?
    If you can access content from databases as an end user while upgrading databases why should i defer the upgrade phase in a redundant environment?
    Thank you again,
    Andy

    Reply

    1. Hi Andy,
      there is no technical requirement to postpone but some customers prefer to do it that way as they are used to do it that way with 2013.
      It is still possible to postpone it. It is the decision of the customer which method he prefers.
      Cheers,
      Stefan

      Reply

  4. Hi Stefan,
    in regards to the Zero-Downtime Patching, you knwoe there are some limitations for the Distributed Cache Service. The regarding English Technet site states it correct in the “MinRole Server Deplpoyments” section (https://technet.microsoft.com/en-us/library/mt346114(v=office.16).aspx). However, the German, Russian, Spanish (from my tests) still reference a quorum role for the Distributed Cache Service.
    From my experience and different Technet articles (e.g. https://technet.microsoft.com/en-us/library/jj219572.aspx and https://technet.microsoft.com/en-us/library/jj219613.aspx?f=255&MSPPError=-2147217396) the statement with a nine server MinRole depoyment can not be right. Eight should be right with the mentioned Distributed Cache limitations in the English article. Could you please shed some light on this (or report internally to fix it)?
    Thank you

    Reply

  5. Hi Stefan,
    Is Zero downtime patching supported on ‘Single server farm’ mode? If the DB upgrade doesnot cause any outage, is Zero downtime patching is recommended for single server farm scenarios?
    Thanks,
    Karthik

    Reply

    1. Hi Karthik,
      zero downtime patching requires high-availability of each server role.
      A single server farm cannot have high availability. Ergo no zero downtime patching on single server farm.
      Cheers,
      Stefan

      Reply

  6. Hi Stefan, Great article thanks for that… I have one question though what does Microsoft mean with high-available? is it that every role should have a minimum of two severs or three? Because high available in a Skype For busniness Environments means at least three servers for the app fabric. So I am a bit confused for the SharePoint part. Do we need two servers for each role or three? I hope you can shed some light on this matter
    Thnx in advanced,
    Rgds,
    Red

    Reply

    1. Hi Redjesh,
      yes – at least two servers per role is required.
      Cheers,
      Stefan

      Reply

  7. Hi Stefan,
    I still confuse…Assume that I have TWO WFEs/APPs (load balancing) in SharePoint 2013, so does this structure support zero downtime patching ?
    Many thanks,
    David

    Reply

    1. Hi David,
      SharePoint 2013 does NOT support zero downtime patching.
      Zero downtime patching can only be achieved with SharePoint Server 2016.
      Cheers,
      Stefan

      Reply

  8. Hi Stefan, Your posts are much appreciated. I have a question regarding the “Language Dependent” second update file: e.g From the TechNet page “2.Run the wssloc2016-kb2920690-fullfile-x64-glb.exe file (that is, wssmui.msp).” On the there is a “Note” saying: “You may need to extract the wssmui.msp file for each language installed on the farm”.
    My question is, do I need to only extract and run only the languages relevant to my farm at this point?

    Reply

    1. Hi Dan,
      you should not have to extract anything.
      Can you send me the link to the technet article where you read this?
      Thanks,
      Stefan

      Reply

      1. Hi Stefan, I was also a little confused about the language statement in this article:
        https://technet.microsoft.com/en-us/library/ff806338(v=office.16).aspx
        The article states:
        To install the update
        Run the sts2016-kb3115088-fullfile-x64-glb.exe file (that is, sts.msp).
        Run the wssloc2016-kb2920690-fullfile-x64-glb.exe file (that is, wssmui.msp).
        Note:
        You may need to extract the wssmui.msp file for each language installed on the farm.
        The word language is never mentioned again. Is the article trying to state that we need wssmui.msp for each additional language pack we have installed?
        Thanks!

        Reply

        1. Hi Bob,
          this paragraph confuses me as well 😉
          Not sure what the writer tried to say here.
          You only have to install this exe once – thats all. It will automatically apply the fix to all languages installed on your system.
          Cheers,
          Stefan

          Reply


  9. Hi Stefan,

    Wanted some clarification on ZDP. I have SP 2019 with 2 WFE’s and SP sites that have VIP (2 WFE’s behind it) so during patching how does ZDP work in this case? So i need to remove one of the WFE server from VIP at LB end during ZDP while patching WFE server?

    Thanks,

    Florencio

    Reply

    1. Hi Florencio,
      yes, you need to ensure that no requests hit the server that is being patched.
      Cheers,
      Stefan

      Reply

  10. Hi Stefan,

    Thank you for your reply. Apologies for replying so late. But one more question. same scenario, how is it different and why wouldn’t it work in SharePoint 2013? I know ZDP is introduced from SP 2016 on wards, but if we remove members from LB pool why would it not work in SP 2013. That’s a question that my LB team asked to which i din’t have answer. We have now migrated from SharePoint 2013 to SharePoint 2019 and this Month we are testing ZDP first time. Appreciate your inputs.

    Thanks,

    Florencio.

    Reply

    1. Hi Florencio,
      zero downtime patching consists of two steps: installing the patches and upgrading the databases. The first step can be done in SP2013 without a downtime but not the second one. The reason is that in SharePoint 2013 the database upgrade is a disruptive step. While upgrading the database stored procedures and table valued functions are dropped and afterwards replaced with a new version. Any access to the databases during the upgrade time could lead to all type of issues including exceptions, incomplete operations, database inconsistencies, …
      For that reason it is unsupported to access the databases while the database upgrade is running.
      In SharePoint 2016 and 2019 the database upgrade has been reworked in a way that this cannot happen and that stored procedure calls guarantees to work even if a database upgrade is performed.
      Cheers,
      Stefan

      Reply

  11. Hi Stefan,

    We have a SharePoint 2016 farm that has only one SharePoint server and one database server. When we install monthly updates, we also install the MUI/language patch if one is available.

    I can follow the steps shown here: https://docs.microsoft.com/en-us/sharepoint/upgrade-and-update/install-a-software-update#install-a-software-update-on-servers-that-host-search-components. But I’m not sure when would be the good spot to shaft in the MUI/language patch. I believe the Config Wizard needs to be run for the MUI/language patch as well? So would the following steps a good one to follow?

    Suspend SSA using PowerShell.
    Stop those three search related services.
    Install the patch (sts2016…exe) (This generally prompts for a reboot and the services stopped in step 2 will auto start after reboot).
    Verify search components become active. (Get-SPEnterpriseSearchStatus -SearchApplication $ssa | where {$_.State -ne “Active”} | fl)
    Resume SSA using PowerShell.
    Verify SSA can crawl new content.
    Install MUI/language patch.
    Upgrade content DBs (Get-SPContentDatabase | Upgrade-SPContentDatabase).
    Run Config Wizard UI.

    Do you know if there’s a more appropriate steps than the above?

    Thank you very much for your contributions to the community all these years!

    Reply

    1. Hi Conax,
      you should install both patches together – do not do any other steps in between.
      If a reboot is required you can do this after both patches are installed.
      Running PSConfig at the very end is correct.
      Cheers,
      Stefan

      Reply

  12. Hi Stefan,

    We have a SP 2019 high availability environment. We are trying to implement zero downtime patching on this environment. If users enter the data during zero downtime patching and if something goes wrong with the patching, we may have to restore backups and recover the farm. So in this scenario how can we make sure the data entered during patching is not lost. Is there any realistic way we can achieve this?

    Thank you,
    Varun

    Reply

    1. Hi Varun,
      I don’t think that there is a simple way to achieve this.
      The most important part here is to evaluate each fix in a test environment against all business critical features avoid having to rollback the production environment if there is a problem in a fix.
      Cheers,
      Stefan

      Reply

  13. Hi Stefan,

    Thanks for the great article!

    I have a question about upgrading from 2016 to 2019. You mention the improvement “upgrading the SharePoint databases does not lead to a downtime for the end user”.

    Does this mean that I can be running the upgrade-spcontentdatabase on a content DB while users are still able to access it?

    In other words, would it be possible to:
    1. mount-spcontentdatabase with the -skipsiteupgrade in order to mount the 2016 DB quickly in the new 2019 Farm and have it available to users
    2. Run upgrade-spcontentdatabase (which can take several days in our case) and still be available to the users in read/write?

    Thank you!
    Matthew

    Reply

    1. Hi Matthew,
      unfortunately for version-to-version upgrade this will not work.
      Zero-downtime-patching is for build-to-build upgrade. Means when installing Patches for (e.g.) SharePoint Server 2016 or 2019.
      For Version-to-Version upgrade you need to perform the upgrade to be supported.
      Cheers,
      Stefan

      Reply

      1. Thanks for your reply!

        Reply

  14. We are using SP2016 with SQL standard edition (not support snapshot). May I confirm ZDP is not possible because during upgrade-SPContentDatabase the SharePoint service will be interrupted?

    Reply

    1. Zero downtime patching cannot be achieved using snapshots. Because Snapshots are readonly. The zero downtime strategy allows patching and running the config wizard (or upgrade-spcontentdatabase) against the live database without snapshots.

      Reply

      1. So during Upgrade-SPContentdatabase is running there is no interruption of service? (i.e. end users can still read and write)

        Reply

        1. Yes. That’s where the name “zero downtime” comes from.

          Reply

  15. Sir, I thought that the Zero downtime patching included first disabling User Profile Sync then re-enabling it after all patching and PSCONFIG is done. Did I dream that or has that guidance changed? Same for disabling Search crawls and Distributed Cache.

    Thanks

    Reply

    1. The thing you talked about is for MINIMAL downtime (because you disabled search application, DC). This article is about ZERO downtime (Users can still searching, my sites are still working during upgrading)

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.