Hybrid cloud storage with an object storage back end

I attended the Object Storage Summit last week in San Francisco. It was an excellent event with lots of good discussions about object storage - what it is, how people are using it, how is it being sold, is it a feature or a market? In short all the necessary navel-gazing that goes on when a technology is trying to break through to a mass market. One of the things that became clear to me was the importance of articulating how object storage is being used and why it is effective. Hence, this post.

Hybrid cloud storage uses object storage in the cloud. Data on-premises is uploaded to cloud object storage until it is accessed again. Many people have first hand experience using cloud object storage with file sharing apps/sites like Dropbox, SkyDrive, YouSendIt, etc, but cloud object storage it also done very effectively with enterprise-level hybrid cloud storage (HCS) like the Microsoft HCS solution. In this case, a StorSimple CiS system, which is iSCSI-based, integrates with Windows Azure Storage, which is object based.  

Obviously, a data translation process turns block data into objects. This happens during a data deduplication process, when incoming data exits the input queue in a CiS system. The deduped block objects in a CiS system are called fingerprints and have object properties such as being content-addressable and immutable. From that point on, block data is managed as objects, whether it is on-premises or in the cloud. In other words, the Microsoft HCS solution is a hybrid object store for block data. There are a lot of benefits to working this way, including:

  • Automated off-site data protection and access. Fingerprints are protected and tracked off-site automatically. Fingerprints suppors backup, DR, archiving and capacity expansion.  
  • Automatic replication.  That's a service provided by Windows Azure Storage to protect fingerprints against cloud disasters.
  • Seamless scaling.  Fingerprints are migrated to Windows Azure Storage, one at a time, using the same name as on premises. There is no getting lost in translation. 
  • Metadata-based storage management.  Multiple storage functions are linked to single, replicated fingerprints. Backup, DR, archiving and capacity scaling all use the same fingerprint objects. Metadata is changed instead of copying data again and again for different purposes. For instance, migrating data to the cloud is usually a metadata change that doesn't actually move data.
  • Data portability. Metadata plus fingerprints form a portable, deduped block data system that can be rehydrated anywhere there is a CiS system. DR is location-independent.
  • Deterministic, thin recoveries. DR is application-driven and only downloads the data that applications need, as they open files. There is no waste of resources in the network or at the recovery site dealing with the large amounts of data that aren't needed to recover.

When you look at solving the most vexing problems in your storage environment, especially data growth, backup data protection and the inability to test DR plans, the benefits of a hybrid object storage architecture are indeed powerful.