Scalability for SharePoint CAML Query


Hello,


today it’s time to focus a bit into a topic scalability. To make your content available it’s nice to use content query webparts or custom webparts which create a CAML-Query to display specific content like:



– display content from many Lists of a specific type over the SiteCollection


This can be done very easy with a CAML-Query objects like SPQuery or SPSiteDataQuery.  It’s a very fast way to get in contact with all requested items and it’s better then iterations. But what happens if your CAML-Query runs over larger content/Scopes? 


SPQuery – Scope – represent a Query in a ListView -> soft limit for Performance= 2000 Items per list view, so you should be aware of performance issues in case your query collects more then 2000 items as result.


SPSiteDataQuery – Scope – perform across multiple lists in mutiple web sites on the same site collection -> soft limit for Performance=2000 Items per list view as result & think about the content source on which in query will be executed! (because the Scope is a Site Collection!)


Why it’s important to check the content source size of the SPSiteDataQuery?


A site collection is bind to one contentdb. MS recommendation is to have not more then 100 GB in one contentdb because of backup/restore time. Normally a developer of custom code is not aware of performance in case the content is very small and a CAML query is the best way to collect data. Day per Day and many user can create much more Data (e.g. Subsites, Lists, ListItems) and it comes the day that you will have a large content size inside your site collection.


At the end of the Scenario / Information design it looks like this:


1 site collection = in one Contentdb
   |
  200 subsites (recommendation not more then 2000 Subsites per SPWeb)
              |
           10 List on each subsite
                   |
               1 of these list exist with a our suffice templateID (e.g. 115 – XML Form library) 


You should see that in the beginning your query (SPSiteDataQuery) run in a empty site collection much fast then over a full filled site collection.


Back to this information design:
What’s do the SPSiteDataQuery in case you want to search over the whole site collection inside all Lists with template ID 115?


The CAML query will be transformed to a SQL query by SharePoint-API. Using SQL profiler Trace you are able to see the query. The created sql query looks like this structure:



exec sp_executesql N’
       Declare @temporaryTable(… some coumns, … ,… )
              
       — insert into temporary table (this is done for each of the 200 Form sites that are children of the site collection root site)
       insert into @temporaryTable Select … ,… , …

     — Select statement to retrieve form column values that were specified in CAML query
      Select UserData.[column], … ,…. from … inner join …. where …. order by …


    — Select output of tempTable
    Select [column1],[…],[columnsX] from @temporaryTable


The SQL query runs over each subweb and on each List inside your site collection. In case you have a site collection with a size of 1, 5, 10 or 50 GB then the execution time of the SQL query takes seconds until minutes.


With each second the user need to wait on the content, you will lose user experience.


To avoid such long running query which decrease the usability, try to split such a long query into smaller parts or let a custom timer job run the query.


How the timer job can  help here?



  1. create a timer job and a hidden list

  2. let the timer job execute such a big query every 10 minutes or what time you want

  3. timer job: store the result of the big query into the hidden list

  4. to render the content for a user, you should be able to use now the query result from that hidden list. This increase the usability because the hidden list has already the content you need.

I hope this help you for further performance issue in long running queries.


regards


Patrick

Comments (3)

  1. Anonymous says:

    Hi bitFlinger,
    i have no good answer for you. the best idea is to render data in smaller parts like: e.g. start a SPSiteDataQuery per SubSite and at the end you have to Combine the results. or an even better idea do not use the spsitedataquery in a realtime page rendering. it’s even better to choose a search driven architecture (use the SharePoint Search index to find you results)

    regards

    Patrick

  2. Jeff25 says:

    What a cool trick! Thanks! -jeff

  3. bitFlinger says:

    I got a SPSiteDataQuery question maybe you know the answer to….

    You have a site (SP2007) with approx 1000 second-level sub-sites. You would like all users to have read access to all of the lists in the sub-sites (inherited). In addition, if the user is a member of the sub-site, they also get some extra contribute rights (sub-siteXXX membership group).

    Now, you want to do SiteCollection list queries and limit the returned data based on where the currentUser is a member of the sub-site. Out of the box, SharePoint does a good job of limiting results based on whether the user has read access to the list. But, how can I limit it to “sub-site group members” only when all users belong to an inherited group which gives everyone read access to all sub-site lists?

    bitFlinger
    bitFlinger@hotmail.com

Skip to main content