So, I blog a good deal on information technology and technology management but I have yet to give my basic impression of what will happen with the future of core storage technologies themselves.
First, I know in advance that this post is going to create a firestorm of opinions and I know that there will be disagreement. I think it is important to note that EMC, as a customer-centric company will work now and always to offer the technologies and solutions demanded by the marketplace. Don’t look for us to stop building products just because I have a prediction. The actions we take are based upon demand, not prognostication.
Before I can discuss the storage needs, however, I first need to give you my take on the evolving requirements around data and information. Data is the “customer” of storage so to understand where storage is going, you start with data.
I believe that the “data” world will continue to divide itself into 2 distinct types – often previously called structured and unstructured data. But it is not quite that simple anymore because organizations must move more and more to add some “structure” into their unstructured data to make it usable. So, in effect, all data and information is going to become more “structured.” These terms are no longer good monikers to define the data types.
Rather, I believe the bifurcation of data will be more and more based on the need for what I call “single transaction latency.” OLTP systems have this requirement today and transactional performance remains a paramount attribute for the associated storage systems. Single transaction latency is critical because, most OLTP systems operate off of a single relational database (for consistency). In this case, total bandwidth and IO capacity are typically secondary elements to latency. You can view these systems like superhighways with a single toll booth - the performance of the toll booth (IO latency) drives system performance.
In contrast, the bulk of the remaining information (estimated at 70%+ today and growing to 95% by 2010) will fall into the “other” category – I will just call this “Web” data. As I noted, the defining difference in this data is that single transaction latency is not the most critical factor. Take, for example, a search on the Web. Any search you or a do may take ½ a second. Does it matter that much if it takes 0.45 sec or 0.55 sec? Not really. Since the searches from many people can be run in parallel, the need here is aggregate performance. On the superhighway, you can have slower toll booths but they are not the bottleneck as long as you have enough of them.
The change that I believe we will see is that the “unstructured” data needs to become more “structured.” Clearly, using a classic relational database is not the answer. Isolating Web data within a database application would be far to constraining. The “structure” will come from tagging, indexing, metadata, and object structures with defined ontology’s.
We recently acquired a company called XHive that builds some great technology to aid in our efforts. Essentially XHive build XML database technology. This provides a way to structure data in a more relational way while avoiding the constraints of using a proprietary db structure. Since the data and metadata is maintained in XML there is no lock-in to any particular application either.
Within these data types, there are an infinite number of other performance, reliability, and information requirements that will continue to drive tiered storage and Information Lifecycle Management needs. So why did I zero in on only one attribute to define the data types? The reason is simple, for OLTP applications, the transaction latency need drives the optimized storage architecture. For “Web” data, architecture is driven by more aggregate system requirements.
While there are clearly almost an infinite number of data types and requirements, the first premise is that storage architectures will, for the foreseeable future need to address these two fundamental needs for data (in the past characterized to be “Structured” and “unstructured” data) – I now consider it to be more appropriately labeled “OLTP” and “Web” data.
In the next blogs, I will discuss the use of core storage technologies and the future state of information availability.