Interesting problem when creating a variation hierarchy on an already deleted site collection

My colleague Cedric Naudy recently analyzed an interesting problem: Hierarchy creation for all site collections in a specific content database failed to create the hierarchy while hierarchy creation in other web applications worked fine.

The Hierarchy creation jobs were marked as succeeded and but no new hierarchies were created.

While analyzing the issue Cedric noticed, that the ScheduledWorkItems table contained work items for the CreateVariationHierarchiesJobDefinition timer job, which belonged to a site collection, which does not exist in the specific content database.

Further analysis revealed that the site collection has been deleted on the same day as it was created. Before the deletion the site collection administrator started to create the variation hierarchy before he later decided to discontinue the site collection.

As the CreateVariationHierarchiesJobDefinition timer job only runs once a day per default the scheduled work item for the CreateVariationHierarchiesJobDefinition timer job was still in the ScheduledWorkItems table after the site collection was deleted.

The timerjob now tried to create the variation hierarchy but failed to locate the site collection in the content database which led to an unexpected System.ArgumentNullException when trying to remove the problematic work item from the ScheduledWorkItems table. This exception causes the timerjob to stop without further processing other scheduled work items for other variation hierarchies in other site collections in the same content database.

As the problematic work item still remained in the ScheduledWorkItems table the next run of the CreateVariationHierarchiesJobDefinition timerjob again tried to process it and again ran into the same exception which again caused the timerjob to stop without further processing work items.

Each time the “Create Hierarchies” button is clicked in any site collection which resides in the same content database, a new work item is added to the ScheduledWorkItems table. But these work items are never processed because the timer job fails already on the older problematic work item which references the deleted site collection.

For those of you who are interested to reproduce this problem you can use the following steps:

  1. Create a webapplication.
  2. Create two site collections within this webapplication (e.g. /sites/pub1 and /sites/pub2) with publishing portal as a template.
  3. Setup variations on the two site collections (e.g. create an EN label as a source label and create the hierarchies).
  4. Now manually run the timer job for spawning the hierarchies (“Variations Create Hierarchies Job Definition”).
  5. Create a new label for variations in Pub1 site collection.
  6. Click on “Create Hierarchies”. The message saying the timer job has been created is shown. The timer job is visible in the ScheduledWorkItems table in the webapplication content DB.
    DO NOT run this timer job (by default it would run between midnight and 3am).
  7. Delete the Pub1 site collection.
  8. Create a new label in Pub2 variations.
  9. Click on “Create Hierarchies”. The message says the job was successfully created. We can see it present in the ScheduledWorkItems table.
  10. Force the execution of the “Variations Create Hierarchies Job Definition” timer job (run now).

Expected result:

The hierarchies should be created for Pub2 site collection.

Actual result:

The “Variations Create Hierarchies Job Definition” is visible as succeeded in job history (it took 0 seconds) but no hierarchy was created for the new label.

On the site, we could see that the hierarchies were not created.

How can we detect this problem?

If you suspect that you have ran into this issue you should enable verbose logging for the “SharePoint Foundation” – “General” category. Afterwards you will find the following entry in the ULS log, whenever such a problematic work items is being processed:

OWSTIMER.EXE (0x1EF4) 0x1C8C SharePoint Foundation General 8nc8 Verbose TimerJob WorkItem Processing exception: System.ArgumentNullException: Value cannot be null. Parameter name: workItemId at Microsoft.SharePoint.SPWorkItemCollection.DeleteWorkItem(Guid workItemId) at Microsoft.Office.Server.Utilities.TimerJobUtility.<>c__DisplayClass1.<ProcessWorkItem>b__0() at Microsoft.Office.Server.Utilities.MonitoredScopeWrapper.RunWithMonitoredScope(Action code) at Microsoft.Office.Server.Utilities.TimerJobUtility.ProcessWorkItem(SPWorkItemCollection workItems, SPWorkItem wi, WorkItemTimerJobState timerJobState, ProcessWorkItemWithState processor) at Microsoft.SharePoint.Publishing.Internal.VariationsSpawnJobDefinitionBase.ProcessWorkItem(SPContentDatabase contentDatabase, SPWorkItemCollection workItems, SPWorkItem workItem, SPJobState jobState) at Microsoft.SharePoint.Administration.SPWorkItemJobDefinition.ProcessWorkItems(SPContentDatabase contentDatabase, SPWorkItemCollection workItems, SPJobState jobState) at Microsoft.SharePoint.Administration.SPWorkItemJobDefinition.HandleOneContentDatabase(SPContentDatabase db, SPJobState jobState)

This error message is a bit misleading. It is complaining about a null value while workitemID is a GUID. In fact, this fails because the ParentID field in the “ScheduledWorkItems” table references the GUID of the site which was deleted.

The DeleteWorkItem method (of the SPWorkItemCollection object) needs to have an existing site to remove the WorkItem from the table (see the community content in this link):

Solution/Workaround

As you can see in the callstack visible in the ULS logs, the public API to delete a work item has a limitation that prevents us from removing the problematic Work Item as it requires to get the SPSite object the work item belongs to first. As this site collection no longer exists, it is not possible to retrieve the SPSite object, which means that we cannot remove the work item using this API. On the other hand until we have to remove the work item in order to get the hierarchy creation working again for site collections in this content database.

To solve the issue, it is required to call the internal Delete method of the affected work item itself.

As this is not a public API it is required to use Reflection to get access to this method as outlined in the following code sample:

protected void DeleteWorkItem(SPWorkItem workItem)
{
    try
    {
        MethodInfo info = typeof(SPWorkItem).GetMethod(“Delete”
                BindingFlags.NonPublic | BindingFlags.Instance); 
        if (info != null)
        { 
            info.Invoke(workItem, null); 
        }
    }
    catch (Exception) { }

To pass the SPWorkitem object as a parameter to this function, standard SharePoint Object Model calls can be used to list the WorkItems and identify the one we want to delete.

To identify the Hierarchy Creation workitems, we can, for example, filter by its well-known GUID.

Note:

This issue is not very likely to occur. It is quite rare to have someone asking for the creation of a hierarchy and, in the same day, delete the site collection hosting it.

If you run into such an issue, we recommended to open a support case with Microsoft to get it fixed.

1 Comment


  1. A short comment: a user sent me the info how to perform the same using direct SQL commands.

    Never ever perform direct modifications to the SQL database!

    That is unsupported and you would have to delete the database and start from scratch to get back to a supported state!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.