OK, so I don’t as a rule use my blog to comment on specificposts from others but a recent blog of Hu Yoshida on storage virtualization (What is Required to Network Storage to Storage) both intrigued and puzzled me.
First off, I completely agree with his premise and conclusion. What puzzles me is that, given this conclusion, why is HDS building TagmaStore?
Virtualization, in my opinion (and I think Hu’s from what I read) needs to enable heterogeneity, scale, utilization, simplicity and dynamic operations – and do so in a way that provides transparency to the application and information systems functions. Basic volume managers can enable specific applications but, as Hu notes, they tend to actually increase complexity and decrease the ability to optimize a storage network across hundreds of servers and applications.
What is so confounding is that, given this premise, the only logical virtualization choice would be to use a product like EMC’s Invista. OK, maybe I jumped too fast – let me explain.
THE key tenant to integration of virtualization is “operational transparency” meaning that the virtualization function runs within existing environments without changing the transactional interaction between applications and storage. Let’s take a simple example with VMware. The key of the base architecture is to be able to run an x86 application just EXACTLY as if it was running directly on the HW.
For network storage, the only hypervisor-like architecture that can provide this functionality is to implement an “out of band” (aka “split path”) architecture. The principle here is that while all of the benefit of network storage virtualization can be obtained, it is all done using dynamic data routing only (not storage) with out of band management. No data is stored so there is no volatile “state” for the system. In this case, the application interacts with the storage in exactly the same way and when a “complete” response comes back from the storage the application is getting the exact same level of protection as it would without virtualization.
This principle breaks down with all so-called “in band/path” storage virtualization systems. By essentially putting another array in front of other storage arrays, to get any decent performance, the virtualization element must terminate the I/O - and store the data!
Wait a moment – we have another name for things that store stuff…
The problems created here by this simple difference add up quickly.
First, you now have 2 systems instead of one “responsible” for storing data. If you loose either system, you risk loosing the data. I don’t care how good the reliability is; with 2 “state” systems in the chain, the risk more than doubles from whatever it was.
Second, since all in band systems are essentially arrays, they will never be able to scale beyond a certain point. With split-path systems, the routing tables are loaded into any number of switches and, given the low latency, low overhead, and scale-out architecture – one will be able to scale systems that are larger and do so without ever needing to purchase additional memory or buy big new systems.
The one perceived advantage, maybe, for in-band systems is that they are simple and they “hide” the other array interfaces. For this problem I would say that, it you don’t like your current array interface, there are much easier ways to solve it then to put in another layer of complexity. To me, a TagmaStore is not the much different than the volume managers and it effectively hides the array’s behind it rather than actually virtualizing network storage in a way that provides transparency and the real values we want from virtualization.
Seems to me that in-band virtualization systems are just these same array (volume managers) in disguise. How does the saying go “you can dress up an array with a network-storage-virtualization moniker but it is still an array.”