This post is about How to get an accurate item count of your external content crawled through the Cloud Hybrid Search Service Application?
I’ve worked with one of my customers on a larger implementation of Cloud Hybrid Search with SharePoint 2013 and SharePoint Online where the customer is to crawl around 140 TB of content from onprem farms, file shares, LOB..etc. The issue the customer had is they weren’t confident they can query all items crawled by the Cloud Hybrid Search farm, because when trying to compare the search results count at the bottom of the search results page when issuing a general query such as isexternalcontent:1 or an asterisk query ‘*’ they will get a fraction of the expected item count!!
One note to make is, the search results count at the bottom of the search results page is an estimated count, hence you see About xxxx Results at the bottom of the page!! You will also notice the number of results will change as you page through the results page. This is by design as we only retrieve the most relevant results to the search results page (Can you imagine having to retrieve millions of items all at once and do a count on it!!) mater of fact most popular internet search engines do the same. This is due to pre-set timeouts on the query code that are set by the product group, it doesn’t matter if you submit your query through the UI, REST API or the Search Query Tool!!
The best way to get a good item count for your crawled external content is by using the Content Search (Compliance Search) through the Security and Compliance Center by submitting a query for isexternalcontent:1
Look for my other blog post if you are experiencing issues querying for external crawled content where the items/documents are missing from the search results page on SharePoint online, How to Troubleshoot Missing Search Item Results Crawled by the Cloud Hybrid Search Service Application
Happy SharePointing 🙂
Sammy Kailini | Premier Field Engineer | Microsoft