Managing crawl deletion policies for SharePoint Server 2010

Hello! Hal Zucati here, a writer with Enterprise Search User Assistance. I want to share some information about how to manage crawl deletion policies in SharePoint Server 2010.

SharePoint Server 2010 uses four policies that control accidental deletion of content, when the crawl component (crawler) encounters intermittent errors during a crawl. These policies are controlled by properties on the Search service application and they determine how many times and how long an item will be retried in consecutive crawls.

The four policies and the default values for the properties (in bold) are described below.


Delete policy for access denied or file not found

When the crawler encounters an access denied or a file not found error, the index item is deleted from the query component search index if the error was encountered in the more than ErrorDeleteCountAllowed consecutive crawls AND the duration since first error is greater than ErrorDeleteIntervalAllowed hours. If both conditions are NOT met, the index item is retried.

The default value for ErrorDeleteCountAllowed is 30 and ErrorDeleteIntervalAllowed is 720 hours (30 days).


Delete policy for all other errors

When the crawler encounters errors of types other than access denied or file not found, the item is deleted from the search index if the error was encountered in more than ErrorCountAllowed consecutive crawls AND the duration since first error is greater than ErrorIntervalAllowed hours. If both conditions are not met the item is retried.

The default value for ErrorCountAllowed is 100 and ErrorIntervalAllowed is 1440 hours (60 days).


Re-crawl policy for SharePoint content

This policy applies only to SharePoint content. If the crawler encounters errors when fetching changes from the SharePoint content database for RecrawlErrorCount consecutive crawls AND the duration since first error is RecrawlIntervalCount hours, the crawler will force a re-crawl on that content database.

The default value for RecrawlErrorCount is 10 and RecrawlIntervalCount is 360 hours (15 days).


Delete unvisited policy

In full crawls of a content source, the crawler executes a delete unvisited phase where it deletes items that are in the crawl history but were not found in the current full crawl. This policy is exposed by property DeleteUnvisitedMethod, which determines what items get deleted during this phase. There are three possible values:

  • When DeleteUnvisitedMethod is 0, all unvisited items are deleted.

  • When DeleteUnvisitedMethod is 1 (default), unvisited items that have the same host as the start address specified in the content source are retained and unvisited items that were discovered by following links to other hosts are deleted.

  • When DeleteUnvisitedMethod is 2, none of the unvisited items are deleted.

How to view or change these values?

 

Each of these property values can ONLY be changed on the Search service application using Windows PowerShell for SharePoint Server 2010.

To change these properties:

  1. Confirm that you have the appropriate rights on the computer where this procedure is performed.
  2. From the Windows Start menu, navigate to All Programs.
  3. Navigate to Microsoft SharePoint 2010 Products, and then click SharePoint 2010 Management Shell.
  4. Use the GET command to retrieve the desired search application object, as follows: $SearchApplication = Get-SPEnterpriseSearchServiceApplication

To view the current value of a property use the following command:

$SearchApplication.GetProperty("PropertyName")

To change the value of a property use the following command:

$SearchApplication.SetProperty("PropertyName", NewValue)

For more information, see Get-SPEnterpriseSearchServiceApplication (https://technet.microsoft.com/en-us/library/ff608050.aspx).

Thanks for reading. If you have feedback, leave a comment.