Determining the exact cost of storing information is incredibly complicated. Simply calculating the cost of a disk drive or a storage array is meaningless. There are a ton of variables including how many total copies of the data exist, performance requirements, administration and governance costs, etc.
Given that, I am not going to try to create a single calculation as everyone would, justifiably, poke holes in it. Instead, I will make a few points about relative cost and their potential impact on total cost leaving the complicated math (and assumptions) to you. Obviously key cost elements have changed in the last 30 years. The good news is that emerging technologies have created more economical alternatives.
First, there are many storage capabilities that can be offered regardless of the storage environment. Some examples:
- In-line compression/deduplication
- Thin provisioning
- Space-efficient snapshots/clones
Theses features offer value but are ordinarily leveraged within any system type so it makes sense to exclude them from this discussion. In a RAID system, storage space is saved by using algorithms to reduce the overall storage capacity while maintaining data resiliency. The negative impacts of this approach are:
- RAID requires a tightly-coupled system with specialized hardened software
- For high availability, RAID requires sophisticated failover capabilities across controllers and memory
- RAID requires an increase in computational and local I/O requirements for processing and can have significant rebuild times.
While simple two or three-way replication seemed like a crazy waste money of 20 years ago, replication across independent nodes now makes total economic sense:
- For replication, while disk costs might double (assuming 1.5X capacity for RAID and 3X for replication) there is no need for the individual nodes/arrays to have redundant controllers or shared memory. There also is no RAID algorithm to calculate - reducing processing and IOPs requirements
- No proprietary or customized hardware is needed
- Read operations only have to occur on one node and, in the event of a failure, are simply moved to another node - data never needs to be reconstructed from parity
30 years ago, RAID was a clear winner for storage cost but the RELATIVE 20-30 year price decline of various technology elements has been dramatically different. Elements like storage (disk drives) have declined faster than Moore's law while x86 computing has declined almost as rapidly. On the other hand, the relative costs for customized hardware and specialized software have not experienced this rate of decline. Today the total cost of replication-based systems is likely on par or lower than most RAID systems.
At this point, the cost dynamics are interesting but hardly compelling enough to make a statement like "RAID is dead." But, the story doesn't end here. The advent and use of this new model for storage enables a whole new set of technology advances that are truly disruptive.
In Part 3 I will go through the new technologies that will be used in conjunction with this scale out replication model and why this combination will totally disrupt traditional RAID approaches.