Reconciling SharePoint 2013 Content Database Size to Total Size of Site Collections

Have you ever looked at the total physical size of content database (DB) files in SQL Server and wondered why it doesn't reconcile with the total size of site collections contained in the database?  What you'll probably observe is the total size of all site collections in a content DB will be considerably less that the database size property you see in SQL Server Management Studio.  This blog post with explore some reasons why.

As an example, I took a quick look in my own SharePoint 2013 lab environment at a web application which had 1 content DB. Following is what I observed for SQL vs. SharePoint size proportion...

Web application:  https://intranet.contoso.com
Content database:  WSS_content_intranet

Per SharePoint
• Site Collection Count : 19
• Sum of Size in MB     : 110.757159233093

Per SQL Server Management Studio
Database: WSS_Content_Intranet
Properties:
  • Database Size:  206.06 MB
  • Database Space Available:  1.09 MB

C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\WSS_Content_Intranet.mdf
154 MB

C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\WSS_Content_Intranet_log.ldf
52.MB

Total size of all site collections was about ½ of the database size property in SQL Server Management Studio.  What might explain this?

The SQL database size is the physical size of the database files (mdf and ldf). It is queried by the sp_helpdb stored procedure. This size includes all SQL data related to the database: transactions logs, indexes, SQL database schema, permissions and tables metadata. Depending on SQL administration it may include pre-allocated space for future growth - around 10-25%.

For SQL Server, actual content size can fluctuate up and down over time. When has backup happened? That will commit all transactions to the DB and the size will become smaller.  The mdf file size can be smaller than the DB size in SQL Management studio, that is because Management studio shows the current database size including uncommitted transactions in the transaction log(ldf).

Why is SharePoint’s report of current site usage so much less that what shows in SQL content database?  There is no actual direct correlation between the database size and content size. One reason is SharePoint will display the size of the content stored within the Content Database (caveat: also read below referenced Support article "Total Size from Storage Metrics shows full size file version in SharePoint 2013"), but it won't include the size of the whitespace, that is the disk space currently allocated to the database files to permit more content to be uploaded.  As users add content to their SharePoint sites, the content database will naturally grow.  To have a content database size that is larger than the content is usually a good thing because it gives SharePoint room to grow without the performance hit of having to auto grow the DB at every write/save.

If the database size relation to the content size gets unreasonably big then a shrink might be motivated, but do proper research on the effects first.  Shrinking database files should only be done in very specific circumstances (NEVER automatically or as part of a maintenance plan) because they cause horrible fragmentation of your databases. A good candidate is the situation where you just moved a large site collection to another content database and you leave your original database with 80% of unused space for example. Database files grow, that's what they do.

If a user uploads ten 1 MB documents to a site, you’d see the site content in SharePoint’s report increase by about 10 MB. At the same time, the SQL database size either may not change in case space has already been allocated for this data; or the change will be greater as SQL needs to keep auxiliary structures such as transactions logs, indexes, document properties in order to store these documents. So, there will be storage overhead added by SharePoint and possibly by SQL auxiliary data.

The site collection size SharePoint reports does not include SharePoint permissions structures, audit and events data.  It is comprised of the total size of the following data:

  • documents
  • doc versions
  • list items
  • account personalization info
  • webparts
  • document metadata
  • Recycle bin (1st stage end-user recycle bin)

How is the SharePoint’s storage usage calculation for a site collection done?  For inquiring minds who want to know… SharePoint scans AllDocs, AllFileFragments, AllWebs, AllLists, AllDocVersions, AllUserData, AllWebParts, Personalization, ContentTypes, and RecycleBin for the total size. Please note, the DocStreams table is not considered in this algorithm. There’s a gap between the AllDocs and DocStreams for the same file. The actual file in binary form is stored in DocStreams. AllDocs table stores the meta info. Thus, the DocStreams table will be huge if there’re lots of files stored in the site. That being said, now the question is why DocStreams is not considered in the calculation algorithm.  The reason is a) this is an auxiliary table that only contains binary information; b) the information in this table does not actually reflect the file size, but size of the binary length. Therefore, it is not considered.  As a result, the content size stored in the database for a site collection takes more space than reflected in SharePoint’s storage usage calculation.

You might wonder since the calculated site collection size and actual content size are not aligned, what’s the point? This number will be the same once the site collection is backed up to disk (using stsadm -o backup or backup-spsite). If you’re interested, the size reported in the SharePoint UI is from SPSite.UsageInfo.Storage (https://msdn.microsoft.com/EN-US/library/microsoft.sharepoint.spsite.usageinfo.storage) and its internal name is SiteDiskUsed. You can manually force a re-calculation using SPSite.RecalculateStorageUsed (https://msdn.microsoft.com/en-us/library/microsoft.sharepoint.spsite.recalculatestorageused).

What features in SharePoint 2013 can influence the size of content databases?  The following SharePoint Server 2013 features can significantly affect the size of content databases:

  • Recycle bins:  Until a document is fully deleted from both the first stage and second stage recycle bin, it occupies space in a content database. Calculate how many documents are deleted each month to determine the effect of recycle bins on the size of content databases. For more information, see “Configure Recycle Bin settings in SharePoint Server 2013.”
  • Auditing:  Audit data can quickly compound and use large amounts of space in a content database, especially if view auditing is turned on. Rather than letting audit data grow without constraint, we recommend that you enable auditing only on the events that are important to meet regulatory needs or internal controls. If you enable auditing within a site collection, use the following guidelines to estimate the space that you must reserve for auditing data:
    • Estimate the number of new auditing entries for a site, and multiply this number by 2 KB (entries generally are limited to 4 KB, with an average size of about 1 KB).
    • Based on the space that you want to allocate, determine the number of days of audit logs you want to keep.
    • TIP!  Remember to also configure automatic audit log trimming for your site collection.  Site Settings > Site Collection Administration > Site collection audit settings > Audit Log Trimming
  • Gradual site delete:  When you delete a site collection, it’s not immediately completely deleted from the source content database just the flag for deletion has been set to 1 in the database for that site collection. You will find a timer job in central administration, scope to the web app, which is called “Gradual site delete.”  This job needs to run to delete the whole site collection from the database. The timer job is scheduled by default to run daily, but you can also force it to run manually. When that job runs the source database should get smaller for the size of the site collection removed.

ADDITIONAL REFERENCES

Total Size from Storage Metrics shows full size file version in SharePoint 2013
https://support.microsoft.com/en-us/kb/3038333

Software boundaries and limits for SharePoint 2013
https://technet.microsoft.com/en-us/library/cc262787.aspx

Storage and SQL Server capacity planning and configuration (SharePoint Server 2013)
https://technet.microsoft.com/en-us/library/cc298801.aspx

Configure Recycle Bin settings in SharePoint Server 2013
https://technet.microsoft.com/en-us/library/cc263125.aspx

Timer job reference (SharePoint 2013)
https://technet.microsoft.com/en-us/library/cc678870.aspx

Best practices for SQL Server in a SharePoint Server farm
https://technet.microsoft.com/en-us/library/hh292622.aspx