Potential for database corruption as a result of installing Exchange 2007 SP3 RU3


Update 3/31/2011: The updated Exchange 2007 SP3 RU3 has been released. See Announcing the Re-release of Exchange 2007 Service Pack 3 Update Rollup 3 (V2).
3/30/2011: We posted a status update for this issue. See Exchange 2007/2010 Rollup 3 Status Update.

Over the weekend, the Exchange Product Group was made aware of an issue which may lead to database corruption if you are running Exchange 2007 Service Pack 3 with Update Rollup 3 (Exchange 2007 SP3 RU3). Specifically, the issue was introduced in Exchange 2007 SP3 RU3 by a change in how the database is grown during transaction log replay when new data is written to the database file and there are no available free pages to be consumed.

This issue is of specific concern in two scenarios: 1) when transaction log replay is performed by the Replication Service as part of ensuring the passive database copy is up-to-date and/or 2) when a database is not cleanly shut down and recovery occurs.

While only a small number of customers have been affected to date, we believe the risk is significant enough that we are recommending all customers to uninstall Exchange 2007 SP3 RU3 on all Mailbox Servers and Transport servers. Uninstalling the rollup will revert the system back to the previously installed version. We have also removed the Exchange 2007 SP3 RU3 download from the Microsoft Download Center and from Microsoft Update until we are able to produce a new version of the rollup.

We are actively working this issue and based on test results plan to release an updated version of Exchange 2007 SP3 RU3 to the Download Center later this week. In addition, we are conducting an internal review of our processes to determine how to prevent issues such as this in the future.

When this issue occurs, the following similar events are logged in the Application Event log of the Mailbox server. Regardless of whether you see these types of events, you should review the recovery instructions and begin that process. If you are uncomfortable performing any of these steps please contact Microsoft Support for assistance.

  • Event ID: 454
    Event Type: Error
    Event Source: ESE
    Event Category: Logging/Recovery
    Description: Microsoft.Exchange.Cluster.ReplayService (12716) Recovery E20 SG1\DB1: Database recovery/restore failed with unexpected error -4001.
  • Event ID: 2095
    Event Type: Error
    Event Source: MSExchangeRepl
    Event Category: Service
    Description: Log file D:\logs\SG1\E200006AFAE.log in SG1\DB1 could not be replayed. Re-seeding the passive node is now required. Use the Update-StorageGroupCopy cmdlet in the Exchange Management Shell to perform a re-seed operation
  • Event ID: 2097
    Event Type: Error
    Event Source: MSExchangeRepl
    Event Category: Service
    Description: The Microsoft Exchange Replication Service encountered an unexpected Extensible Storage Engine (ESE) exception in storage group ‘SG1\DB1’. The ESE exception is a read was issued to a location beyond EOF (writes will expand the file) (-4001) ().

In addition, in environments utilizing Continuous Replication, comparison of the database file between the active and passive nodes will indicate that the database file has decreased in size.

Regardless of whether you are experiencing this issue, we strongly recommend taking the below actions to ensure that you do not experience any data loss or outage event associated with this issue.

For example:

  • If you have deployed your Mailbox servers utilizing Cluster Continuous Replication (CCR), failure of the active copies may affect your service SLA as you may have no viable passive copies to activate. Hardware failures may result in you not having a means to recover up to the point of failure and thus may experience data loss.
  • If you have deployed your Mailbox servers utilizing Single Copy Clusters (SCC), switchovers or failovers may result in this issue as there is only one copy of the database and recovery is performed during switchovers and failovers.

For environments leveraging CCR and/or Standby Continuous Replication (SCR)

If you note the listed events in your environment the following steps must be taken in order to restore your high-availability configuration:

  1. Rollback the CCR Mailbox server hosting the passive database copies and any SCR target Mailbox servers to the previously installed version (e.g., Exchange 2007 SP3 RU2) by uninstalling RU3.
  2. Re-seed all affected database copies on the CCR Mailbox server and any SCR target Mailbox servers hosting the passive database copies.
  3. Verify the database copy status is healthy for all passive copies.
  4. Perform a switchover and rollback the remaining CCR Mailbox server to the previously installed version (e.g., Exchange 2007 SP3 RU2).

If you are not seeing these events in your continuous replication enabled environment, we recommend the following steps:

  1. Rollback the CCR Mailbox server hosting the passive database copies and any SCR target Mailbox servers to the previously installed version (e.g., Exchange 2007 SP3 RU2) by uninstalling RU3.
  2. Perform a switchover and rollback the remaining CCR Mailbox server to the previously installed version (e.g., Exchange 2007 SP3 RU2).

For environments leveraging Single Copy Clusters (SCC)

  1. Rollback passive nodes within the SCC environment to the previously installed version (e.g., Exchange 2007 SP3 RU2) by uninstalling RU3.
  2. Perform a switchover and rollback the remaining SCC Mailbox server nodes to the previously installed version (e.g., Exchange 2007 SP3 RU2).
  3. If you have any databases that will not mount as a result of the above issue, you can restore the damaged databases leveraging a last known good backup.

For environments leveraging standalone Mailbox (or Public Folder) servers

  1. Rollback the standalone Mailbox servers to the previously installed version (e.g., Exchange 2007 SP3 RU2) by uninstalling RU3.
  2. If you have any databases that will not mount as a result of the above issue, you can restore the damaged databases leveraging a last known good backup.

For Hub Transport and Edge Transport servers

  1. Rollback the standalone transport servers to the previously installed version (e.g., Exchange 2007 SP3 RU2) by uninstalling RU3.
  2. If any transport servers have mail.que databases which currently do not mount as a result of the above issue, you can recover them by following the steps in Working with the Queue Database on Transport Servers.

Kevin Allison
GM Exchange Customer Experience

Comments (28)
  1. pete says:

    You have to be kidding me; i just updated to SP3 RU3 this morning, and am now experiencing this very issue. I only have one DB that failed and will not reseed out of 40 DB’s. After I uninstall RU3 should I reseed all DB’s or just the one with the issue? Is there an easy way to reseed all DB’s?

  2. Pete – If only a single database has been affected, then you should only need to reseed that one database after uninstalling SP3 RU3.

    Ross

  3. susan says:

    Did this get offered up on MU/WSUS on the 22nd?  Just seeing how broadly this may have gone out.

  4. @Susan: E2007 SP3 RU3 was originally scheduled to go on MU today, but we pulled it before its release.

  5. Richard Sobey says:

    Does this issue at all affect Exchange 2010 SP1 RU3?

  6. Sven J says:

    Hi,

    If I'm running single mailbox server with SP3 RU3 and everything is fine except. some other RU3 problems, what shoud I do? Roll back or leave it and wait RU4 or what?

    rgds

    Sven

  7. Bharat Suneja [MSFT] says:

    @Richard Sobey: This issue does not affect Exchange 2010. As mentioned in the post, it's specifically something in Exchange 2007 SP3 RU3.

    @Janx444: If you're running Exchange 2007 SP3 RU3, even on a single/standalone Mailbox server, we recommend that you rollback to the previously installed version by uninstalling RU3.

  8. Sven J says:

    Hi  Bharat Suneja,

    OK. Rollback, but should I do some additional steps with DB's?

    As I mentioned, I don't see any errors in my event log.

    Just simle RU3 uninstallation and thats all?

    rgds

    Sven

  9. I repeat myself: Quality Control

    This isn't doing the product image any good guys.

  10. Tuur says:

    What about UM? Do I need to do a rollback on UM or can I leave that one solely on RU3 whit the rest on RU2?

  11. Dave says:

    I am so upset about this.  I JUST went to RU3 this past weekend on my entire Exchange 2007 environment.  8 hours of upgrading!!!!!!!!!!

    This has got to be the most FAIL programming I have seen from Microsoft in quite some time.  First 2010 RU3 backout and no 2007.  Who is "not" TESTING this code?!?!?

  12. Janx444 – Dismount the databases and uninstall the rollup.  

  13. TJM says:

    @ Ross – Just need to confirm that within a CCR environment dismounting the databases is NOT a necessary step to uninstalling RU3 – I installed the Update two weeks ago – not seeing the problem but need to get change control in to get the work done right away.  I've read the instructions just want to be clear that there is nothing missing.

  14. Arjen says:

    Hy Ross,

    We use SCR to replicate LOGS from 16 databases (each around 50GB), from MAILBOX1 to MAILBOX2.

    MAILBOX2 don't have active databases on it. Only logs / databases from SCR replication (SCR Target).

    Recently I only updated MAILBOX2 to SP3 RU3. I don't have the specified events in the log.

    Test-ReplicationHealth and Get-StorageGroupCopyStatus don't show any errors.

    I have removed RU3. Do I have to do anything else?

    Regards,

  15. pete says:

    Richard, Exchange 2010 SP1 RU3 was pulled as well due to an issue with Blackberry's sending duplicate messages:

    blogs.technet.com/…/exchange-2010-sp1-rollup-3-and-blackberrys-sending-duplicate-messages.aspx

  16. John McNamee says:

    I'm suprised it took this long for this recommendation to come out. I worked with MS for two days on March 23, 24th regarding this issue. I had to uninstall the patch and then reseed 26 databases. Aren't patches tested anymore?

  17. PAMM says:

    We get an error like this in OWA premimum and no error in owa light after applying rollup 3: Error: Outlook Web Access was not able to process this request. If i remove rollup 3 owa brakes down, so i have to reaply it. Can rollup 3 fix this? Dam didnt Microsoft test this.

  18. TJM – You can follow the procedure identified above which is uinstall the RU on the passive node, perform a switchover, and then uninstall the RU on the other node.

    Arjen – Nope, you are all set.

  19. Timothy says:

    One of the lucky "affected customers" – SCC – had to restore 8 DB's from backup.

  20. Vince says:

    What about the out of control database growth with Exchange 2010 SP1?  How is Microsoft addressing that issue?  It is a shame to see such a great product like Exchange take an image beating like this.  Makes me want to enroll in some Lotus Notes classes!!!  

  21. Paul T says:

    I installed 2007 Sp3 and followed it with RU3 on 3/20/11.  I had LCR enabled for 3 storage groups- I've disabled LCR for now.  I'm not seeing any of the 3 event errors above in the application event viewer.  My info stores total about 70gb and I only have 1 exchange server with hub transport/cas and mailbox roles.  

    If I uninstall RU3, should I then install RU2 since previously only SP3 was installed?  Do I have a higher risk of db corruption from RU3 uninstall than if I were to just leave it installed and wait for the re-release of RU3?  Finally, when RU3 is released again would installation of it require uninstall of the problem RU3 first or will it just over install?

  22. Chris says:

    Ya'll act like ya'll are perfect and can't make mistakes, damn at least they come up with solutions and offer help to fix issues. You can't duplicate everything.

  23. Paul – you can safely uninstall SP3 RU3 and go back to SP3; uninstall is simply a binary operation replacement and does not touch the database files.  You do not have to install SP3 RU2.

    The re-release version will have its version incremented which will enable it to be installed overtop existing SP3 RU3 installations.

  24. Cameronk says:

    I only installed RU3 on my 2 CAS/Hub Transport servers.  Is it really necessary to go through the recover mail.que databases if I'm not experiencing any problems?  I'm hoping that simply uninstalling RU3 would suffice.

  25. Cameron, you only need to do recovery operations on the mail.que database if the database will not mount due to the above issue.  If you are experiencing no issue, then simply uninstall the rollup and you will be fine.  I'll update the steps to make that clearer.  Thanks for the feedback.

  26. GeneralLee says:

    I have installed this update on 2 CAS servers, which are not mailbox or transport servers. Should I still uninstall RU3, or is there no harm in leaving it installed until the next TU3 is released?

  27. General Lee – there is no harm leaving it on CAS only machines as CAS does not utilize the Extensible Storage Engine (ESE). However, we have now released the updated version of RU3, so you should install that version and delete the older MSP file.

    Ross

  28. - Anoyed says:

    i just updated to SP3 and it crashed my computer do u guys have an explanation for this

Comments are closed.