Next to “Cloud” one of the hottest new techie areas is Big Data. Of course, being a data storage company at our roots, the term Big Data to a company like EMC is akin to telling me that I get a Big Rib-eye steak for dinner; yum.
The simple fact is that Big Data could transform business and society in very positive and meaningful ways, but the point of this post is to just provide some basic context as to why Big Data should be on virtually everyone’s short list for investigation.
I start with two simple axioms for Big Data:
More is better
Extracting value is the trick
While I think we always can get overloaded with information, the opposite I believe is true with data. More is always better. Before you think this a naïve statement, consider two caveats. First, I assume we can separate valid and invalid data. Second, I make the assumption that there is a way to analyze the data. These two needs, eliminating invalid data and analysis, will become the key ingredient in delivering new value.
So, what can Big Data do for us? The potential is limitless. For example, it has the ultimate potential to help find cures for diseases and revolutionize personalized medicine. It has the potential to change how we deal with poverty, and related Big Things. Big Data can also change our day-to-day lives. Here is a simple example: imagine that when you setup an appointment for lunch 20 miles away, a program automatically calculates the drive time looking at both the predicted weather and the traffic delays for the time of day and day of the week, and then automatically allocates the time block on your calendar.
Much of Big Data will be about combining and correlating data across multiple datasets. Where business today uses BI software to track customer buying trends by some facets (like store location, time of day, etc), the increasing trend will be to look at more and more datasets as potential indicators and sales predictors.
Another new trend in the "data" world is mining and using unstructured data as a data source. Examples of unstructured data are video, files, and even Facebook and Twitter feeds. The challenge here has been how to extract semantic data out and then index it in a useful way. For example, one use has been to capture "sentiment analysis" datasets around a company or product based on looking for the number of positive and negative words found adjacent to the particular term.
This is just the beginning. I expect that video recording in stores to go far beyond being used just for security and businesses will start to build datasets for everything from how long we have to stand in line to how we walk through the store. I am not even going to engage in the "big brother" implications here. This is just the evolution of cultural norms as we now share our daily lives with hundreds of "facebook friends" and carry a GPS transmitter in our pocket or purse. The clear hope is that this data will be used to improve our experience, efficiency, health and quality of life.
So, the base technology is here today. We have the capacity to store, the technology to extract semantic data, the database capability to structure the data, the indexing and map reduce capability to analyze and create UI's to present the results. This opens the door to a whole new set of innovators and innovations that can leverage all of this technology and solve problems that we could not imagine solving a decade ago. I think of it as not just finding a needle in a haystack but finding matching needles across 10,000 stacks of needles.
Very exciting stuff.