Debugging: Incremental Mailbox Sync Move Requests Cause 'StopAll' to be called on the Assistants Infrastructure Which Causes CalendarNotificationAssistant to Stop Processing Text Message (SMS) Notifications

Not too long ago, I debugged an issue with the Calendar Notification Assistant (CNA), wherein the Assistant would stop processing text message (SMS) notifications for events. The affected end-users noted that if they enabled text message (SMS) notifications in Outlook Web Access (OWA), the notifications would work for around 24 hours and then they would fall back into reproduction of the issue, in which the notifications stop altogether.

When I went looking in PowerShell, I could verify (via the Admin Audit Log) that the commands to set the configuration for SMS were ran and that it was enabled:

 Search-AdminAuditLog -Cmdlets Set-TextMessagingAccount | Where{$_.ObjectModified -contains '<objectId>'} | FL


RunspaceId         : c7683058-b25c-4557-818d-546cf5f178c6
ObjectModified     : <objectId>
CmdletName         : Set-TextMessagingAccount
CmdletParameters   : {CountryRegionId, MobileOperatorId, NotificationPhoneNumber, Identity}
ModifiedProperties : {}
Caller             : <callerObjectId>
ExternalAccess     :
Succeeded          : True
Error              : None
RunDate            : 12/22/2015 1:15:17 PM
OriginatingServer  : <serverName>
Identity           : <identity>
IsValid            : True
ObjectState        : New

We could also see the messages being sent via the Message Tracking Log (MTL):

 $MsgTrk | Sort TimeStamp | FT EventId,Source,Sender,Recipients,MessageSubject,TimeStamp -AutoSize

EventId Source      Sender                       Recipients                                  MessageSubject   Timestamp
------- ------      ------                       ----------                                  --------------    ---------
RECEIVE STOREDRIVER <senderEmailAddress>     {IMCEAMOBILE-+2B10008675309@domain.com}                     12/17/2015 13:18:34
SEND    AGENT       <senderEmailAddress>   {IMCEAMOBILE-+2B10008675309@domain.com}                     12/17/2015 13:18:34
RECEIVE STOREDRIVER <senderEmailAddress>   {IMCEAMOBILE-+2B10008675309@domain.com}                     12/17/2015 13:19:46
SEND    AGENT       <senderEmailAddress>   {IMCEAMOBILE-+2B10008675309@domain.com}                     12/17/2015 13:19:46
RECEIVE STOREDRIVER <senderEmailAddress>   {IMCEAMOBILE-+2B10008675309@domain.com}                     12/17/2015 16:15:02
SEND    AGENT       <senderEmailAddress>   {IMCEAMOBILE-+2B10008675309@domain.com}                     12/17/2015 16:15:02

Using this, I could tell the originator of the message: The Mailbox Assistants. This was done by the seeing that the client and the server were the same:

 $MsgTrk[0] | FL


RunspaceId              : 13ad4581-25d9-4136-8ed1-a284b7e979db
Timestamp               : 12/21/2015 15:40:31
ClientIp                : <ipAddress>
ClientHostname          : <serverName>
ServerIp                : <ipAddress>
ServerHostname          : <serverName>
SourceContext           : 08D30340C2F43943
ConnectorId             :
Source                  : STOREDRIVER
EventId                 : RECEIVE
InternalMessageId       : 2429739
MessageId               : <messageId>
Recipients              : {IMCEAMOBILE-+2B10008675309@domain.com}
RecipientStatus         : {To}
TotalBytes              : 6386
RecipientCount          : 1
RelatedRecipientAddress :
Reference               :
MessageSubject          :
Sender                  : <emailAddress>
ReturnPath              : <emailAddress>
MessageInfo             : 04I:
MessageLatency          :
MessageLatencyType      : None
EventData               : {[MailboxDatabaseGuid, 347a971c-e8f5-45de-8e01-870ed4dc6901], [ItemEntryId, 00-00-00-00-1E-95-32-44-0A-EA-D6-47-A5-98-9B-F0-5F-89-B9-09-07-00-BE-9B-24-68-00-23-66-49-95-41-BB-56-C6-02-E3-D0-00-00-00-03-71-5C-00-00-75-FB-A6-9E-77-CF-BE-43-A8-2A-B9-7F-C1-34-9A-01-00-00-1C-C-91-D1-00-00]}

Once I knew this, I checked the Database Event Watermarks for any backlog:

 <databaseName>\<serverName> CalendarNotificationAssistant             977234167 989106153    11871986

I now had enough data to assert that a dump of the process was necessary in order to see what's going on.

At the time of the dump, despite the fact that the user was no longer receiving text message (SMS) notifications, the process was processing (for lack of a better turn-of-phrase at the moment) a two week old message in thread 21 in the stack:

 0:021> !mdt 00000000036e1088
00000000036e1088 (Microsoft.Exchange.MailboxAssistants.Assistants.CalendarNotification.CalendarEvent)
    <CalendarItemIdentity>k__BackingField:00000000036e1208 (Microsoft.Exchange.Data.Storage.StoreObjectId)
    <CalendarItemOccurrenceIdentity>k__BackingField:00000000036e1048 (Microsoft.Exchange.Data.Storage.OccurrenceStoreObjectId)
    <CalendarItemType>k__BackingField:0x1 (Microsoft.Exchange.Data.Storage.CalendarItemType)
    <NormalizedSubject>k__BackingField:00000000036e0880 (System.String) Length=5, String="Lunch"
    <Location>k__BackingField:00000000036e0860 (System.String) 
    <ReminderIsSet>k__BackingField:true (System.Boolean)
    <ReminderTime>k__BackingField:(Microsoft.Exchange.ExchangeSystem.ExDateTime) VALTYPE (MT=000007ff002c5cb0, ADDR=00000000036e10d0)
    <ReminderMinutesBeforeStart>k__BackingField:0xf (System.Int32)
    <StartTime>k__BackingField:(Microsoft.Exchange.ExchangeSystem.ExDateTime) VALTYPE (MT=000007ff002c5cb0, ADDR=00000000036e10f0)
    <EndTime>k__BackingField:(Microsoft.Exchange.ExchangeSystem.ExDateTime) VALTYPE (MT=000007ff002c5cb0, ADDR=00000000036e1110)
    <FreeBusyStatus>k__BackingField:0x2 (Microsoft.Exchange.Data.Storage.BusyType)
    <ChangeHighlight>k__BackingField:0x0 (Microsoft.Exchange.Data.Storage.ChangeHighlightProperties)
    <AppointmentState>k__BackingField:0x0 (Microsoft.Exchange.Data.Storage.AppointmentStateFlags)
    <CreationRequestTime>k__BackingField:(Microsoft.Exchange.ExchangeSystem.ExDateTime) VALTYPE (MT=000007ff002c5cb0, ADDR=00000000036e1130)
    <ResponseType>k__BackingField:0x0 (Microsoft.Exchange.Data.Storage.ResponseType)
    <OldStartTime>k__BackingField:(System.Nullable`1) VALTYPE (MT=000007fef6207cc8, ADDR=00000000036e1150)
    <OldEndTime>k__BackingField:(System.Nullable`1) VALTYPE (MT=000007fef6207cc8, ADDR=00000000036e1178)

We can see the date of the item from the backing field:

 0:021> !mdt 000007ff002c5cb0 00000000036e10d0
(Microsoft.Exchange.ExchangeSystem.ExDateTime) VALTYPE (MT=000007ff002c5cb0, ADDR=00000000036e10d0)
    timeZone:00000000014bb040 (Microsoft.Exchange.ExchangeSystem.ExTimeZone)
    universalTime:(System.DateTime) 2015/12/23 16:15:00.000 VALTYPE (MT=000007fef62482d0, ADDR=00000000036e10d8)
    localTime:(System.Nullable`1) VALTYPE (MT=000007fef6207cc8, ADDR=00000000036e10e0)

And we can see that it's processing the item for the same user that lodged the complaint:

 0:021> !mdt 0000000003761ad0
0000000003761ad0 (System.String) Length=129, String="<sameUserFromAbove>"

Finally, after perusing the dump for an hour or more, I found what I believed to be introducing the issue:

 <ContextInfo>k__BackingField:00000000032fe908 (System.String) Length=7, String="StopAll"

Given this, it was time to go to the devs and figure out what was going on. After some back-and-forth and a conference call with both the Store and Calendaring devs, we arrived at what was causing the issue.

First, it's important to understand that an incremental sync was created for the user but had not - as of the time of debugging - been completed:

 Get-MoveRequestStatistics 957ca0a9-f84f-425e-a263-5b468e90d574 | FL *increm*


EffectiveIncrementalSyncInterval  : 1.00:00:00
ConfiguredIncrementalSyncInterval : 1.00:00:00

Since it syncs once a day, this aligns with the end-user's statement regarding reconfiguration but we needed proof.

When we went and looked at the Mailbox Database Events for the mailbox in question, we saw the 'MailboxMoveStarted' event fire:

 $eventz | FT Counter,ItemType,EventName,ObjectClass,Mailbox,CreateTime -AutoSize

  Counter ItemType   EventName          ObjectClass Mailbox CreateTime
  ------- --------   ---------          ----------- ------- ----------
974015558 MAPI_STORE MailboxMoveStarted                     1/25/2016 21:32:01
974683553 MAPI_STORE MailboxMoveStarted                     1/26/2016 21:41:10
975414298 MAPI_STORE MailboxMoveStarted                     1/27/2016 21:44:00
976136834 MAPI_STORE MailboxMoveStarted                     1/28/2016 21:52:10
976842768 MAPI_STORE MailboxMoveStarted                     1/29/2016 22:02:30
977056359 MAPI_STORE MailboxMoveStarted                     1/30/2016 22:03:45
977195259 MAPI_STORE MailboxMoveStarted                     1/31/2016 22:15:01

What's important to note as a side-comment/take-away is that the 24-hour interval doesn't initiate from the same start time. This is because the counter for the 24-hour interval starts when the last sync stopped.

When we matched this against a trace of the Assistant, we saw the following:

 "2016/01/13-03:06:05.279","9444","2480","Assistants","EventDispatcher","Dispatcher for <userDisplayName> (957ca0a9-f84f-425e-a263-5b468e90d574) and Calendar Notification Assistant: Event handled successfully: Counter: 0x3B331E14; MailboxID: 957ca0a9-f84f-425e-a263-5b468e90d574; Mask: MailboxMoveStarted; Flags: Source; ExtendedFlags: None;Object Class: ; Created Time: 1/13/2016 11:06:04 AM;Item Type: MAPI_STOREItem EntryId: Parent entryId: Old Item entryId: Old parent entryId: SID: S-1-5-21-276227223-3107228690-1043224028-159164Client Type: MigrationDocument ID: 0" 
"85868820"        020B591B          Debug 2016/01/13-02:01:49.777 9444     2480            MailboxAssistants.Assistants.CalendarNotification Assistant           rmd----- caller: StopAll, subj: <subject>, usr: <objectId>c1, calItemId: RgAAAAAelTJECurWR6WYm/BfibkJBwC+myRoACNmSZVBu1bGAuPQAAAAA3FaAAB1+6aed8++Q6gquX/BNJoBAAAcyVmOAAAP, calItemOccId: , event_t: 01/13/2016 16:15:00, rmd_t: 01/13/2016 19:45:00, s_t: 01/13/2016 20:00:00, e_t: 01/13/2016 21:30:00
"8326743"         006B5AB0          Debug 2016/01/13-03:06:05.277 9444     2480            MailboxAssistants.Assistants.CalendarNotification UserSettings      Processing Disabling event for mailbox <objectId> on database 347a971c-e8f5-45de-8e01-870ed4dc6901

To elaborate on this bug and explain it further: Recall back in Exchange 2007 (E12) that mailbox moves caused the mailbox to be locked. When this occurred, it would've been necessary for any code that attempts to open the message store to stop processing the mailbox until the move is complete. Fast-forward a few years (nigh to a decade) and incremental sync move requests are a common, every day occurrence. When the incremental sync starts, there is no delineation between it and a regular move request, so a database event fires and this signals the assistant to stop processing the mailbox. When the text message (SMS) notification is reconfigured, it signals the Assistant to start processing on the mailbox. This explains the behavior seen by the end-users.

For those of you concerned about hitting this bug in the wild, we've only see it happen once and we're using that instance to drive a change in the Assistant logic in all versions of Exchange (assuming all have the same bug in the logic).