Single Instance Storage in Exchange 2007


Single instance storage is a feature of Exchange that we haven’t talked about much since the 1990s. We made some changes to how single-instancing works in Exchange 2007, so I wanted to give everyone an overview of what’s changed and why. But first, some background:

What is single instance storage?

The idea behind single instance storage is that if a message is addressed to multiple recipients, and these recipients are located on the same database, the message is stored just once. This functionality has existed in Exchange since version 4.0.

Evolution of single instance storage

Over the years, the importance of single instance storage (SIS) in Exchange environments has gradually declined. Ten years ago, Exchange supported about 1000 users per server, with all users on a single database. Today, a typical mailbox server might have 5000 users spread over 50 databases. With one-tenth as many users per database (100 users vs. 1000 users), the potential space savings from single-instancing is reduced by 10x. The move to multiple databases per server, along with the reduction in space savings due to items being deleted over time, means that the space savings from SIS are quite small for most customers. Because of this, we’ve long recommend that customers ignore SIS when planning their storage requirements for Exchange.

SIS has also diminished in importance because of the way storage hardware has evolved. Over the past 10 years, the capacity of disk drives has risen sharply, but IO performance has remained flat, leaving most Exchange customers constrained by disk IO rather than disk space. In 1996, a typical disk was 10GB in size, and delivered about 100 IOPS; or about 10 IOPS/GB. Today, a typical disk is 500 GB and delivers about 100 IOPS; or about 0.2 IOPS/GB. The IOPS per GB has dropped 50 fold. Single instancing is fundamentally about saving disk space at the expense of increased IOs. So, while trading IOs to save space was a good strategy 10 years ago, today a focus on IO reduction makes more sense.

Single instance storage in Exchange 2007

Exchange 2007 included a large amount of work to reduce IO (please refer to http://msexchangeteam.com/archive/2006/09/08/428860.aspx for details). Some of these changes affected the way that Exchange handles single-instancing of messages. In Exchange 2007, attachments are single instanced, but message bodies are not.

This behavior does not apply to the move mailbox operation, so when you transition to Exchange 2007 from Exchange 2000 or Exchange 2003, single instance storage is maintained for both message bodies and attachments, as long as:

  • The mailboxes being moved belong to the same source database and the same destination database
  • You are using a "transition" approach rather than a "migration" approach for your upgrade

For an explanation of the differences between "transition" and "migration," see http://technet.microsoft.com/en-us/library/bb124008.aspx

Looking forward

Given current trends, we expect the value of single instance storage to continue to decline over time. It’s too early for us to say whether SIS will be around in future versions of Exchange. However, we want everyone to understand that it is being deemphasized and should not be a primary factor in today’s deployment or migration plans.

Nick Rosenfeld


Share this post :

Comments (21)
  1. Mike Crowley says:

    Thanks for this post!  I’ve always tried to make this arguement, and its great to finally have something definitive to reference!

  2. Joshua Konkle says:

    Hi Nick

    Nice article, you got me thinking as you took me back in time to 1995 with your Exchange 4.0 comment.  I never understood why anyone really cared about Single Instance Storage.  In my experience, from the early days of 4, 5, 5.5sp3 *yay*, 2000 and 2003 SIS was always about message delivery.  Most of the systems we would architect had multiple servers in each site, err group.  This meant that the real benefit was when a message to 1000 people in a remote site, err group, was delivered it didn’t cross the wire 1000 times – just once.  SIS was a keen value for Bridgehead (BH) communication, not storage optimization.

    Only with the advent of Exchange 2007 64-bit and ultra-large cache and more users and larger databases does this become helpful.  SIS is dependent on a lot of users, not just a couple thousand.  The more users and data, increase the probability that the same items will exist. *yay* for IOPs management, thanks for making this point.

    You mention the introduction of detaching attachments.  Administrators can start to see some storage improvements, but is itonly with the pool of users on that server, storage group or store?

    You didn’t mention that in your post – can you clarify where SIS happens?

  3. bday says:

    Joshua, I think you may be confusing Bifurcation and Single Instance Storage. On E2K7 bifurcation happens on the hub transport server closest to the point of delivery using the least cost routing delivery path.

  4. Joshua Konkle says:

    @bday

    bifurcation – that’s interesting.  That’s a topic many people avoid.  Thanks for bringing that up.

    Look, I’m not going to get into contextual issues with bifurcation without a white board and dry-erase marker :-)

    Bifurcation is splitting, fundamentally.  The term is used loosely through multiple versions of Exchange.  What they call bifurcation in the context of the transport/categorization engine in the latest release, was traditionally called single-instance message delivery in older versions of Exchange.

    I leave this reference for you (had to go searching since you brought up bifurcation).  We’ll have to meet, bifurcation has a myriad of points in Exchange, dating back to 5.0/5.5 IMC/IMS – TechEd perhaps?

    XADM: Single Instance Storage Ratio is Low

    http://support.microsoft.com/kb/198673

    Thanks, I’ll patiently await the response from Nick on the precise point of detachment of attachment for SIS/storage benefits :-)

    Joshua Konkle

  5. Nick Rosenfeld says:

    Hey Joshua,

    Single Instancing happens at the store level itself.  This is for message bodies (Exchange 2003 and previous) and for attachments (currently all versions of Exchange).  The distinction comes into play because the data is stored in different tables within the Exchange database.  Each database has one Message table, which holds the bulk of the properties on a message including the body, and one Attachments table that stores all of the attachment related props.  Since the way SIS works has to do with how these tables are structured inside the Exchange database, single instancing can only work with users that are homed on the same store.  

    Hope this helps,

    Nick

  6. Mike Baker says:

    re "looking forward": thank you, thank you, thank you – you’d be surprised how many people got in a serious tiz about losing SIS on interorg migrations and restructuring.

  7. Joshua Konkle says:

    @Nick

    Thanks for your time and clarity.

    Interesting that whole message SIS is still a delivery enhancement.  As a point of reference, I worked for KVS as a Technical Evangelist, prior to that I worked with NT/NTDS and Exchange DS/IS from 1995-? in architect and product management roles, enough posturing for now.  With that in mind, your points about attachment separation leave me in a lurch.

    The lurch I’m left in revolves around the future of attachment separation and APIs.  Will there be an API to access those attachments?  For example, using the API I would like to move the attachment to SharePoint.  In the objects place I would like to leave a SharePoint or /other/ address.  This address would be called by the Exchange MAPI provider in Outlook when the attachment is requested by a user/application.

    One could retain the message objects, but offload the storage expensive attachments to SharePoint or a base file system, where they likely belong.  There would be some minor changes to the categorizer/transport sinks to accommodate sending attachments out of an organization.  However, in my example above, attachment offloading to SharePoint or /OTHER/ would occur over time.  Therefore, the most recent email wouldn’t be subjected the the process intensive recall of offline attachments during outbound email delivery, etc.

    I’m not looking for "use email archiving for mailbox management (with stubs)" as an answer. because. its. not.  (there is a better option for STUBS, especially if the Outlook/Exchange MAPI transport/storage provider had a plug-in option, i.e. indexing)

    So, what are the prospects for an API to start working with the separated attachments?  On a side note, is there PerfMon object/counter for Message/Attachment SIS?

    Joshua

  8. Adam says:

    I’m having major difficultly understanding the Disk Space vs. Disk IO logic…

    If there is a single copy of an attachment in a store (SIS) and ten users all open the same attachment wouldn’t Exchange just grab the attachment one time (1 instance of disk IO) and once it is in memory hand it out to the 10 users?

    SO, Going away from SIS so each user has an unassociated attachment or message in their mailbox same action happens;

    10 users open a message with an attachment without SIS wouldn’t Exchange need to grab the attachment 10 times from ten different places on disk and create 10 times the disk IO?

  9. Nick Rosenfeld says:

    Joshua –

    Attachment related properties have always been stored in the attachments table.  There is no added separation that was added in recent versions.  I can’t really comment on what the future database changes or APIs may bring.  As far as perfmon counters go, I am pretty sure that the single instance storage ratio is the only SIS related counter.

    Adam –

    When data is read from the database it goes into the JET cache.  If we need to re-read that data we’ll check the cache first in order to avoid going to disk if the data is cached.  With Exchange 2003 and below we were limited to having a JET cache of around 900 MB.  Because of this we were limited by how much data could be cached and how long it would remain there, meaning that even with SIS there is no guarantee that we only need to read the information from disk once when several users within a short time of each other attempt to open that attachment.  Even with the fact that some of the changes that were made in Exchange 2007 caused message bodies to no longer be single instanced, Exchange 2007 does provide a very significant reduction in I/Os.  This is due to the advantages we get from 64-bit as well as the other changes described in Chris Mitchell’s previous post: http://msexchangeteam.com/archive/2006/09/08/428860.aspx

    -Nick

  10. Lee says:

    I’m curious what you’re seeing that makes you believe SIS is going to be less of an issue in the future?

  11. Gary says:

    In the post above to Adam, Nick you mention that the improvements come from 64bit.  I’m still not sure why the move away from SIS in the message body and for future. It’s implied that it’s done for performance, but as you mention above the performance primarily comes from 64bit and other areas.

    I also don’t understand the direction from MS to create so many Storage Groups and keep a single database within them.  That TOTALLY removes SIS and increases the number of tracking log files and disk space on Exchange servers.  And in my experience, reduces performance.  Multiple databases I can see (in large organizations), though I tend to use them to keep big mailboxes (exceptions) from the normal users.

    Can you elaborate more on why the move away from SIS?

    Thanks!!

  12. Mike Crowley says:

    Nick, it says right in the article why SIS is being deemphasized.  Its an overhead on performance, and the storage benefit isnt worth the trade anymore.

  13. Mike Crowley says:

    Sorry Nick – I meant to address that last post to Gary

  14. Joshua Konkle says:

    @nick

    So for clarity sake, store.exe is /only/ single-instancing the attachments across like messages, not across the store in general?

    I believe we are winnowing this down to your original point, which was SIS for MSExchangeIS (store.exe) is the *same*, just limited to attachments.  If so then the original SIS PerMon counter is showing administrators /attachment SIS across like messages/ in MSExch2007.

    Thanks

    JK

  15. smartshine says:

    How can i replay exchange 2007 logs

  16. Nick Rosenfeld says:

    Gary –

    64-bit is just one of the ways that we were able to achieve such a significant reduction in IOs.  There were several other changes that contributed to this.  As far as your Storage Group question, check out Chris Mitchell’s post that I link to above as he explains the reasoning behind the increase in the number of SGs.

    -Nick

  17. Dan Mathers says:

    I think this is ridiculous.  Why would an improvement in 64 bit speed warrant a move away from Single Instance Storage?!? Making multiple copies of everything is better?  Have you seen my storage subsystem and tape backup bills lately? How about that continuous replication traffic.  For smaller shops that only have a single mailbox stores, or very little division (there are a lot of us out there!) there’s a real benefit to SIS.  Especially my company which CC’s everyone on the planet with messages.  I mean, why not have us all using 1001 PST files again and use my trusty file system open file agent to backup files?  It’s one more reason not to bother with consolidating mail in exchange. At least demonstrate how NOT using SIS is supposed to be an advantage.  this article comes off as "Since we can do things faster now, we’re not bothering with keeping this efficiency we had before because we an lumber around it".  Just like giving end users a huge hard drive with tons of free space is supposed to motivate them to organize and remove old data they no longer need.

  18. Peter Szabolcs says:

    I remember hearing that single-instancing doesn’t work for messages coming from the public Internet, only in the case of messages originating from a local Outlook/Exchange user. Is it true?

  19. Andrew says:

    I agree with Dan. We have plenty of hard drive space on our server.

    But our tape space and backup window are at a premium. At the moment we are consolidating our storage groups to increase the SIS ratio and reduce our backup times.

    I think they need some System Administrators on the Exchange Team, not just programmers.

  20. Brad says:

    In complete agreement with Dan and Andrew, just because we have available space is no reason to forego efficiency.  If you ever have to recover a database from backup or dirty shutdown you’ll quickly appreciate the optimizations.

    This is no different then the woefully inefficient code that’s often written today because of the spare CPU and memory we have.  Try running bloatware apps written in VB vs. those by Sysinternals written in assembly and you’ll see what I mean.

  21. Tim says:

    Does ESM (Exchange 2003) mailbox size consider SIS? To elaborate, If I have two users who get the same e-mail with a 10MB attachment, will their mailbox reflect a 10MB size increase despite the fact that they only contain pointers?

    I want to calculate a worst case scenario to size a new server and the actual database size does not reflect SIS.

Comments are closed.