Storage from the other end of the telescope

If you are involved with managing your company’s storage infrastructure, you might be tired of hearing about how your company can use IaaS to improve software development. It might sound promising, but as a storage person they won’t help you solve your worst storage problems such as backup and data growth.   

It’s probably not clear how enterprise cloud storage, like Windows Azure Storage, with its longer-than-local latencies and less-than-local bandwidth can be used to manage storage. After all, storage management typically involves transferring a lot of data in as short a period of time as possible.  It’s clear that if enterprise cloud storage is going to help solve your data center storage problems, a number of things in the equation need to change. But what would those things be?

For starters, there has to be a way to lighten the workload of daily data protection so you are uploading less data. Another necessity is to make cloud storage available to systems and applications in a way that aligns better with its performance characteristics. This means finding ways to integrate enterprise cloud storage as something other than a long-distance storage container on the other side of a “cloud chasm” the way cloud gateway products do. A couple ideas for reducing the volume of daily data uploads are to work only with changed data (also called deltas) and the other is to use data reduction technologies like deduplication and compression. Limiting uploads to deltas can work with backup, but is problematic on the restore side if you have to download hundreds or even thousands of virtual backup tapes to achieve a full restore. Restores are always much more difficult than backups due to the many-to-one relationship of media involved where many tapes are used and far more data is processed than necessary to create a final restored image. Data reduction can certainly help, but these techniques are only effective up to the point where the time needed to upload the reduced data exceeds the backup window. So lightening the workload can generate incremental benefits but it is only effective up to a point.

Sometimes it helps to look at things from the other end of the telescope. So instead of thinking about longer latencies, think about how SSDs are being used in the hybrid storage model (not to be confused with hybrid cloud) where the most active data is stored on SSDs and the rest of the data is stored on rotating disks. Now add enterprise cloud storage to the mix and consider using it for the opposite end of the activity spectrum – storing dormant, unstructured data. Most companies have a large amount of this stuff, filling up their storage arrays, getting backed up unnecessarily and lengthening recovery times during restores. What would happen if this dormant data were no longer on-premises and didn’t need to be backed up any longer? Offloading dormant data to enterprise cloud storage lightens the backup load and helps you deal with data growth. It’s not enough by itself, but it’s a big step in the right direction.  

Another assumption that needs to be challenged is that backup is the only technology that can protect data from a disaster. It’s the best choice we’ve had, but that doesn’t mean something new could be better. For instance, an alternative to backup is snapshot technology, which is widely used to periodically capture deltas and is much faster and easier to use for restoring data. The fatal shortcoming of snapshots has always been that they reside on the array alongside live data - and if the array fails or is destroyed the snapshots will be lost too. For that reason, on-premises snapshots are inadequate for disaster protection.

But what if on-premises storage could take daily snapshots and upload them to enterprise cloud storage and what if those cloud snapshots could be mounted the same as on-array snapshots for restoring data? This certainly satisfies the off-site requirements for disaster recovery protection and is a scenario where uploading deltas every day can be very successful.  All that’s needed is a way to know which files would need to be downloaded for a full restore. 

This is what Hybrid Cloud Storage from Microsoft is all about. It combines the Cloud-integrated Storage technology that was acquired with StorSimple and combines it with Windows Azure Storage.  It puts enterprise cloud storage technology in your data center where it filters dormant data and uploads it to the cloud as well as creating daily snapshots that are also uploaded to the cloud. That’s a whole different approach to managing backup and data growth. The cloud is not a disk drive “over there” somewhere, it is right next to you helping to solve your most vexing storage problems.

You might be thinking “how do I locate data after it has been uploaded to the cloud and how do I mount and restore it?”  The answer is metadata, a topic that will be discussed in my next blog post.