Scalability for SharePoint CAML Query

Hello,

today it's time to focus a bit into a topic scalability. To make your content available it's nice to use content query webparts or custom webparts which create a CAML-Query to display specific content like:

- display content from many Lists of a specific type over the SiteCollection

This can be done very easy with a CAML-Query objects like SPQuery or SPSiteDataQuery.  It's a very fast way to get in contact with all requested items and it's better then iterations. But what happens if your CAML-Query runs over larger content/Scopes? 

SPQuery - Scope - represent a Query in a ListView -> soft limit for Performance= 2000 Items per list view, so you should be aware of performance issues in case your query collects more then 2000 items as result.

SPSiteDataQuery - Scope - perform across multiple lists in mutiple web sites on the same site collection -> soft limit for Performance=2000 Items per list view as result & think about the content source on which in query will be executed! (because the Scope is a Site Collection!)

Why it's important to check the content source size of the SPSiteDataQuery?

A site collection is bind to one contentdb. MS recommendation is to have not more then 100 GB in one contentdb because of backup/restore time. Normally a developer of custom code is not aware of performance in case the content is very small and a CAML query is the best way to collect data. Day per Day and many user can create much more Data (e.g. Subsites, Lists, ListItems) and it comes the day that you will have a large content size inside your site collection.

At the end of the Scenario / Information design it looks like this:

1 site collection = in one Contentdb
   |
  200 subsites (recommendation not more then 2000 Subsites per SPWeb)
              |
           10 List on each subsite
                   |
               1 of these list exist with a our suffice templateID (e.g. 115 - XML Form library) 

You should see that in the beginning your query (SPSiteDataQuery) run in a empty site collection much fast then over a full filled site collection.

Back to this information design:
What's do the SPSiteDataQuery in case you want to search over the whole site collection inside all Lists with template ID 115?

The CAML query will be transformed to a SQL query by SharePoint-API. Using SQL profiler Trace you are able to see the query. The created sql query looks like this structure:

exec sp_executesql N'
       Declare @temporaryTable(... some coumns, ... ,... )
              
       -- insert into temporary table (this is done for each of the 200 Form sites that are children of the site collection root site)
       insert into @temporaryTable Select ... ,... , ...

     -- Select statement to retrieve form column values that were specified in CAML query
      Select UserData.[column], ... ,.... from ... inner join .... where .... order by ...

    -- Select output of tempTable
    Select [column1],[...],[columnsX] from @temporaryTable
'

The SQL query runs over each subweb and on each List inside your site collection. In case you have a site collection with a size of 1, 5, 10 or 50 GB then the execution time of the SQL query takes seconds until minutes.

With each second the user need to wait on the content, you will lose user experience.

To avoid such long running query which decrease the usability, try to split such a long query into smaller parts or let a custom timer job run the query.

How the timer job can  help here?

  1. create a timer job and a hidden list
  2. let the timer job execute such a big query every 10 minutes or what time you want
  3. timer job: store the result of the big query into the hidden list
  4. to render the content for a user, you should be able to use now the query result from that hidden list. This increase the usability because the hidden list has already the content you need.

I hope this help you for further performance issue in long running queries.

regards

Patrick