SharePoint 2013 Crawl Tuning Part 1: Baseline


I wrote a basic overview of some of the counters needed to monitor and tune content feeding here . But, I thought it would be helpful to walk through a real world tuning example. This will show how the counters are really used in practice and show the performance improvements that can be gained. Tuning the content feed really is not complicated once you understand the methodology. Even though components and architecture has changed from Fast ESP, to Fast Search for SharePoint 2010, to SharePoint 2013; the process remain unchanged. First isolate the bottleneck component, then scale it up or out, and re-asses. Rinse and repeat until you reach the desired performance.

The environment I will use for testing is a single virtual machine running all of the farm components. It hosts SQL Server, all the SharePoint components, as well as a custom content enrichment web service. The content source I am crawling is a SQL database connected through BCS. The sql table has around 150,000 rows and a full crawl takes about 80 minutes. The VM has the following specs:

Ram: 16GM

CPU: 5 virtual cpus

Disk: Single VHD hosted on a local disk

The VM runs on a server that hosts other VMs and services and the disk is shared. So, like most real world environments, the hardware performance is dependent on how the resources are being shared. This is the reality for many environments and we still need to tune them.

My tuning requirements are simple: make it go faster. This is often as detailed a tuning requirement that you will get.

The first step is to capture a performance baseline, so we can compare the effects of any changes that we make. We first need to defined the counters to collect. Then we will run a full crawl and capture those counters during the crawl. It may not be possible to run a full crawl, using an incremental that picks up enough content will work fine as well. In that case you will need to track progress by looking at the throughput rate and not necessarily the crawl time.

For my baseline test, I started up a full crawl and captured counters for 10 minutes. I have attached the counters.txt file to this blog post. For the initial baseline I will used the counters and steps I outlined previously (SharePoint 2013 Monitor and Tune Content Feed) There are some good details on using powershell to pull the counter data on Mark Sullivan's Blog . I ran the following command to capture the output as a blg:

get-counter -counter (get-content counters.txt) -MaxSamples 10 -sampleinterval 60 -ComputerName ksesearch | Export-counter -Force -Path perfbaseline.blg

Here you can see my transactions waiting backing up. More and more transaction are waiting as the crawl progresses. This Indicates an issue downstream, showing that the crawler is pushing data faster than we can handle it.

The next step is to look at Search Flow Statistics\Input Queue Full Time to determine if indexing or content process is the bottleneck component. You only need to be concerned about the content processing instances of this counter.

Our run shows nothing over 100, which is well under the guideline of 1000. So, our next step will be to add an additional content processing component. The entire crawl took 85 minutes at a feeding rate of 29 documents per second. We'll look to improve that with the tuning. We will discuss the details in part 2.

For reference, here is the raw performance capture:

perfbaseline.blg


Comments (0)

Skip to main content