In the first two posts, I set up the economic foundation as to how technologies evolved and how the fundamental elements of costs for data protection have changed. The most disruptive impacts of moving to new replication-focused hyper-scale storage, however, will be in the application of additional new technologies that were not feasible within tightly-coupled RAID models.  This is where the dramatic cost advantages will be achieved. These new savings will be derived from three main elements:

  1. Scale Out Efficiencies
  2. Software Economics
  3. Business Agility

The efficiencies of hyper-scale systems will be borne out in two ways. First, the overall  space efficiency (the amount of reserve capacity needed) can be significantly improved. With many individual RAID systems is not unusual to have 30-40% unused capacity. This is due to the need to have reserve capacity on each system and the fact that individual systems can become performance-limited while still having significant capacity available. Properly-designed hyper-scale systems offer more flexibility across storage and significantly reduce these inefficiencies.

The second key element of scale out efficiencies is reducing administrative costs. Minimizing the number of "boxes" or systems frees up system administrators and lets them manage more data.

One of the least understand advantages of good hyper-scale systems is the fundamental change in how high availability can be achieved. Here is a simple example. Let's say you have 4 large arrays and want five 9's (99.999%) of availability. This implies that the software in each array must be "bulletproof;" each array would need software with more than 99.999+ reliability. This is really really hard, requires tons of engineering and test resources and significantly reduces the ability to rapidly add new capabilities. Sequential failures (like a HW failure that crashes the SW) are the hardest to accommodate and eliminate.

On the other hand, if you build the same system with 40 small scale out nodes, each being independent with "independent" (loosely-coupled) software elements in a system that can tolerate up to 2 nodes being down at a time then the software reliability can be much lower (roughly 99.9%) while still achieving the five 9's availability goal. While it might seem like a bit of a paradox that a bunch of lower availability elements can be put together to form a high-availability solution - that is essentially how web scale clouds are built. This model significantly reduces development costs and allows for new features to be added rapidly.

Third, with multiple arrays, constant effort is needed to manage growth. Data will constantly need to be migrated to alleviate space issues. And provisioning of new volumes can take days to weeks because system admins need to work to free performance and capacity without impacting existing production systems. Hyper-scale systems that offer QOS (quality of service) controls can not only be provisioned instantly, they are robust enough to allow for self-service user provisioning. The best example of this capability is Amazon S3. This capability addresses what is considered to be the most significant pain point of enterprise storage today.

While the "RAID is dead" statement may seem dramatic, this technology inflection point is the same as the "Tape is dead" disruption that was enabled by disk-based data duplication 10 years ago. Enterprises need only look at the technologies used by Google, Amazon and virtually every major technology startup to see the writing on the wall.

 

Comment

Read More