The rise of cloud-integrated storage and EMC's ViPR

David Isenberg wrote his famous and controversial paper, The Rise of the Stupid Network in 1997.  Its a short and historically interesting read. If you have never read it, follow the link there now. It will take you less than 10 minutes. If you want the Cliff notes version, the gist of his paper is copied below:

JUST DELIVER THE BITS, STUPID

A new network "philosophy and architecture," is replacing the vision of an Intelligent Network. The vision is one in which the public communications network would be engineered for "always-on" use, not intermittence and scarcity. It would be engineered for intelligence at the end-user's device, not in the network. And the network would be engineered simply to "Deliver the Bits, Stupid," not for fancy network routing or "smart" number translation.

Fundamentally, it would be a Stupid Network.

 I've thought about corollaries in storage for many years. Networks and storage are much different. Storage is much more tightly coupled with data management in a way that networks will never be. Data management takes intelligence to make sure everything gets put in its optimal place where it can be accessed again complying with corporate governance, legal requirements and workers expectations. Networks don't really have these sorts of long-term consequences and so apples to apples comparisons aren't very useful.

But that doesn't mean there wouldn't be ways to eliminate unnecessary aspects of storage and lower costs enormously. As soon as data protection and management could be done without needing specialized storage equipment to do the job, that equipment would be eliminated.  Cloud storage changes things radically for the storage industry, especially inventions like StorSimple's cloud-integrated storage (CiS) and a solution like Microsoft's hybrid cloud storage. But StorSimple was a startup and Microsoft isn't a storage company and so it wouldn't start becoming obvious that sweeping changes were underfoot until a major storage vendor came along to make it happen.

That's where EMC's ViPR software comes in. EMC refers to it as software-defined storage, which was predictable, but necessary for them. FWIW, Greg Schulz does a great job going through what was announced on his StorageIO blog

One of the things ViPR does is provide an out-of-band virtualization layer that Greg's blog describes that opens the door to using less-expensive, stupid storage and protecting the data on it with some other global, intelligent system. This sort of design has never been very successful and it will be interesting to see if EMC can make it work this time.

The aspects of ViPR that are most interesting are its cloud elements - those that are expected initially and those that have been strongly hinted at, including:

  • It runs as a VSA (virtual storage appliance), which means it is a storage controller that runs as a virtual machine, including as a virtual machine in the cloud.
  • It will include access to object storage as a back end, which is how "real" storage works in the cloud, unlike AWS' EBS
  • It can use cloud APIs, which is obviously a cloud-thing 

If EMC wants their technology to run on the cloud, and it's clear they do, they needed all three of these things. For instance, consider remote replication to the cloud - how would the data replicated to the cloud be stored in the cloud? To a piece of hardware? No. Using storage network/device commands? No. To what target? The backend to a hypothetical EMC VSA in the cloud uses object storage services and cloud APIs. There is no other way to do it. They could have a VSA that uses iSCSI to a facility like EBS, but that would be like putting the contents of a container ship on rowboats. So, a VSA that accesses object storage services using cloud APIs is the only way. It is a clear signal that ViPR will be their version of CiS. They probably won't call it that, but that's beside the point.

The important thing is what happens to data protection after ViPR is made fully cloud-capable? Once you start using cloud services for data protection, there are a few things that immediately become obvious:

  • You don't need separate data protection equipment any more because you are using a cloud service
  • You can actually use incremental-forever data protection schemes
  • You want to use primary dedupe and compression to reduce the amount of cloud traffic required
  • You maintain a hybrid cloud metadata system that identifies all data whether its on premises or in the cloud 

Those are all things that hybrid cloud storage from Microsoft does today by the way, but that's beside the point too. What's interesting is what will happen to EMC's sizeable data protection business - how will that be converted to cloud solutions and what value can they add that enhances cloud storage services? The technologies they have available for hybrid cloud data protection are already mostly in place and there will undoubtedly be a transformation for Data Domain products in the years to come, but these are the sorts of things they need to figure out over time.

It's going to be a slow transition for the storage industry, but EMC has done what it usually does - it made the first bold move and is laying the groundwork for what's to come.  It will be interesting to watch how the rest of the storage industry responds.