Offline Defrag And DAG Databases, Oh My!


Even though some of the very old KBs, which  now refer to unsupported products, state that taking databases offline to run periodic offline defragmentation with ESEUTIL is not recommended some folks in the field still want to do this.

Previously when there was only a single copy of a database, running offline defragmentation would cause minimal impact, apart from the time required to do the defragmentation process which could be several hours or longer depending on database size and disk throughput. This changes when we consider having multiple copies of a database in a Database Availability Group (DAG).

So you may be wondering how best to defragment Exchange 2010 databases that are in a DAG as people often look at the white space in a database and seek to immediately reclaim it.

In short, this is not a good idea for a couple of reasons:

  • Defragmenting DAG databases leads to more work
  • Mailboxes are offline while the defragmentation completes
  • This is generally a short sighted view as white space will be re-used

Please note that we are discussing offline defragmentation via ESEUTIL /D, and not online maintenance routines that now run 24 * 7 in newer versions of Exchange and in online maintenance windows in previous versions.

Background

What happens when an Exchange database is defragmented using ESEUTIL /D?  The defragmentation process will copy out valid pages of ESE data from the old database file to a new database.  This process leaves white space behind as it does not contain data.  You will note that I specifically said new database.  This has a different GUID than the original database.  Creating a database with the same name, but different GUID, means that Exchange sees them as different databases not as multiple copies of the same database.

 

This will result in errors like the following since the databases are not copies of one another.  Errors that may be seen include, but are not limited to:

  • An Active Manager operation failed. Error Operation failed with message: MapiExceptionJetErrorAttachedDatabaseMismatch: Unable to mount database. (hr=0x80004005, ec=-1216)
  • The Exchange store database <databasename> copy on this server appears to be inconsistent with the active database copy or is corrupted. For more details about the failure, consult the Event log on the server
  • Event ID 494:  Database recovery failed with error -1216 because it encountered references to a database, 'database path', which is no longer present
  • Event ID 454: Information Store (PID) <databasename>: Database recovery/restore failed with unexpected error –1216
  • Event ID 9519: The following error occurred while starting database <databasename>: 0xfffffb40. Failed to configure MDB.

 

Let’s look at an example of the impact caused by running offline defrag against a database that is replicated in a DAG.

Defragmenting Exchange 2010 DAG Database

We shall defragment database, DB01.  Our starting configuration has two copies of this database and all is currently running well.

Exchange 2010 DAG Database Starting Point

 

So let’s dismount DB01, and then validate that the two mailbox servers have the same GUID for DB01.  We are using ESEUTIL /MH to dump out the header from the database.

On the first mailbox server we see the Rand of 2733649.  The GUID is displayed in the ‘DB Signature’ line and is the 'Rand’ value.  Be sure to look at the correct signature as there is a signature for both logs and databases.  It is expected that the Rand in these two lines will be different.

Exchange 2010 Database GUID = 2733649

 

On the second mailbox server we see the same Rand of 2733649, you can see the server name in the title bar of the PowerShell window.

Exchange 2010 Database GUID Same On Second Database Copy = 2733649

We have shown that the same database is present on both servers, i.e. both copies have the same Rand of 2733649.

Let’s now defragment DB01 on the first server, then see what happens……

Exchange 2010 Offline Defragmenting DAG Database

Then let’s check the Rand to see if the old value of 2733649 is still present:

 

Exchange 2010 Database GUID = 143007541

Nope, It’s not.  The Rand is now 143007541.  That shows that this is a different database.  Same name, but this is a different database.

Trying to activate the database copy on another server will create a sea of red in the application event log.  You will receive the errors listed above, and the most descriptive is Event ID 4807:

Active Manager Operation Failed Due To Offline Defrag

 

Recovering From Defragmenting DAG Database

At this point since the databases are no longer copies of one another we will have to re-seed the copy of the database.  Depending upon database size, disk throughput and network capacity this can take an extended period of time.  Let’s use PowerShell to re-seed the database copy:

Update-MailboxDatabaseCopy –DeleteExistingFiles –Identity DB01\Consea-MB2

 

Exchange 2010 Re-Seeding Database Copy Using PowerShell

This will have to be repeated for all database copies of the database in question.  If there are multiple copies over a WAN link then it would be a good idea to manually specify the seeding source using the –SourceServer switch.  That way one copy can be seeded over the WAN, and other copies can then use that as a  local source, thereby minimising WAN traffic and decreasing time.

Note that there are multiple options worth checking out with Update-MailboxDatabaseCopy.  They include options to explicitly choose a network, encryption and compression.  Chances are if you used Exchange 2010 RTM then you are quite adroit at using the –CatalogOnly switch!

 

When the seeding task completes, we can check that the database copies are OK

Checking Database Copy Status In Exchange 2010 PowerShell

Checking the Rand on the updated copy of the database, we can see that it has been updated and now has the same Rand which was generated by the defrag, 143007541. 

After Re-Seed Database Copy Has Updated Database GUID

 

Having to take a database offline for hours to defragment, and then manually reseeding all of its database copies is pretty painful.  Is there a better way to do this?

There certainly is!

A New Hope

Since Exchange 2010 introduced the online mailbox move feature, it is now pretty seamless to perform mailbox moves to a new mailbox database and when the old database is empty, simply delete it!  This process can be made even better with use of the SuspendWhenReadyToComplete parameter.  As an example:

New-MoveRequest -Identity 'User-21' -TargetDatabase DB01   –SuspendWhenReadyToComplete

This copies the vast majority of the mailbox content and then pauses.  The administrator will manually resume the move request using  Resume-MoveRequest.  So this means we can copy mailbox content through the day with no user impact.  After hours the suspended move can then be rapidly completed.  This has to be one of my favourite Exchange 2010/2013 features!

The same logic can also be applied to a mailbox database that must be evacuated for other reasons.  This may be necessary if file system AV has scanned the database as it will be in an unknown and thus unsupported state.

 

Note that the Mailbox Replication Service (MRS) is throttled, and if you wish to apply a little accelerando to the move process then you will need to take a look at the throttling configuration.

 

Cheers,

Rhoderick

>>>

Comments (25)

  1. anonymouscommenter says:

    Rhod, great article as always. In a follow-up article I would love to see comments as to how long a db defrag can take and why trying to reclaim white space is often a fool's errand. Cheers and take care…sc

  2. Howdy Sean!

    Very true, I can't recall when I last did an offline defrag to reclaim space.  Last time would have been many, many moons ago!  

    Say hello to Mr  Thiessen for me please 🙂

    Cheers,

    Rhoderick

  3. Charles Derber says:

    Got a small query here regarding seeding…

    Consider 2(P) + 1(DR) copies of avg DB size 500GB.

    Copy/Replay status seems to be normal i.e. 0. Suddenly something went wrong like log file got missed, ending up in  disk space issue(P).

    Action plan – you now you cannot either activate nor truncate the logs.

    Dismounted the DB(P Active) moved the logs because it was healthy & mounted.

    Q – having the passive copy node *.edb file(500GB) can we use this as incremental updating the DB instead of complete DB seeding between nodes in P & DR site.

    I tried with –SeedingPostponed / -Force switch but no luck – does it works really…or there is no other option but to complete reseed…?

  4. Hi Charles,

    Missed this over the long holiday weekend.    

    Did you do what I have above and dump out the databases headers with ESEUTIL /MH ?  

    What I expect you to find is that on the active instance, the database is updated to use the new log stream but on the passive that copy of the database file is still looking for the original log stream GUID.  Since it does not match the passive DB will not attach to the log stream.

    Does that match your observed behaviour ?

    Cheers,

    Rhoderick

  5. Charles Derber says:

    So you are right about the new log stream as it will not attach but is there any way we could use the same passive old DB for incremental seeding instead asking system to deleteexistingfiles and seed.

  6. Don't think so but, let me have a think about that please.  Things are a bit hectic at the mo, and I'll reply when I get a chance.

    Cheers,

    Rhoderick

  7. Charles Derber says:

    Hmn I understand – Not able to figure out any workaround but will hope anything to come up in future may be seeding via WAN to DR wouldn't be feasible for DB size from 500GB-2TB

  8. anonymouscommenter says:

    Hi, I had a database with several hundred users, moved them to another database but the original db with no users is still 195GB after 2 months, so how do I reduce the size of this if you do not recommend a defrag?

  9. Nick – just remove the database. Create another DB if needed.

    Why would you want to defrag it?

    Cheers,
    Rhoderick

  10. anonymouscommenter says:

    Landing on your blog searching for this answer: If you move a mailbox database, this is a pure file copy? No database defrag (maybe on NTFS level :)) is done with a move of the database?

  11. Correct – the only way to defrag the Exchange database to reclaim whitespace is to use eseutil when the DB is offline.

    Moving the file around does not do that.

    Cheers,
    Rhoderick

  12. anonymouscommenter says:

    Rhoderick,
    I would also like to second your statement that the –SuspendWhenReadyToComplete switch used on Local Moves could be used for the transitions between Exchange versions also.
    I had built an automated script that runs each night after hours that will ‘Complete Move Requests’ so i could start a group of moves during the day, and allow the system to complete them after nobody was using the system any longer. (this also allows for the
    hour+ that MS states AD needs to update before the users belonging to the mailboxes are able to utilize their moved mailboxes again)

    This saved me countless hours of my own personal time during the migration, and with small maintenance tasks since then as the Move Request resume does not require manual intervention (above setting up a scheduled task to run daily after all client use of the
    system is finished and before backup jobs begin).

    This is the command I used, just in case anyone else finds it helpful.
    Get-MoveRequest -MoveStatus autosuspended | Resume-MoveRequest

  13. That is certainly one of my favourite Exchange 2010 features Altren – saves so much time for us and minimises end user impact.

    Cheers,
    Rhoderick

  14. anonymouscommenter says:

    Rhoderick, Thanks for the beautiful article. Will get back to this discussion as i am practicing in lab with different options.

  15. anonymouscommenter says:

    Hi Rhoderick, thanks for sharing this. Just some clarification. Where do we do the offline defrag, is it in active (SERV001A) or passive copy (SERV002B)?

  16. Hi Aubrey, easiest way would be active then reseed all of the passive copies.

    Either way offline defrag is a pain, and the move mailbox allows to evacuate the DB minimizing downtime.

    Cheers,
    Rhoderick

  17. anonymouscommenter says:

    Hi Rhoderick, thanks for the clarification. We have already done the mailbox move process but we wanted to use the available whitespace. Is there any way we can use the whitespace aside from the offline defrag?Thanks

  18. anonymouscommenter says:

    In addition, is there also a backup plan in case the offline defrag failed?

  19. anonymouscommenter says:

    This procedure for the DB desgrafmented Exchange DAG applies to versions of Exchange 2013?

  20. Aubrey – you can backup using standard backup procedures and restor using that or make a copy of the database in clean shut down state.

    Use which whitespace? The whitespace on the source? if that database is not empty, just remove it and create a new one.

    Tonny – if you really want to do this, yes eseutil is still there but why?

    Cheers,
    Rhoderick

    1. Isaac says:

      Hello.. I have a question.
      Will the copy database remain active while I defragment the original database so the users are still online?
      In that case I can loose data if I rewrite the copy database?
      Thank for the answer

      1. Hi,

        Not something you want to try. Just move the mailboxes to a new DB, and delete the original. In your scenario your DB copies are now invalidated due to changing the DB GUID, and yes what about the email in that database that was mounted whilst you were doing the defrag?

        Move mailbox is the way to go here, especially since we have the online move process nowadays.

        Cheers,
        Rhoderick

  21. Nagendra says:

    Awesome write up

    1. maui says:

      Hi Rhoderick, we already moved the active mailboxes to new dbs. But we still want to use the already disabled (disconnected mailboxes) when needed. So i don’t think you could just delete the old dbs. Or do you have a idea to work around this.

      1. Those will be removed from the store in 30 days by default – have you increased this to a higher value?

        Cheers,
        Rhoderick