February 21, 2014

Just how big is Big Data?

Is it bigger than Big Soup? Sounds implausible I know.

In a recent blog post for Science Warehouse, I stated: “‘Big Data’ will soon hit the Peak of Inflated Expectations in the emerging technologies hype cycle and head towards the Trough of Disillusionment as people realise it’s actually quite hard to get meaningful, valuable data out of very large data sets.”

What is Big Data?

A term bandied about with abandon these days, a sure sign of an overhyped concept that is probably misunderstood by those using it. Big Data is really a moving target, it is used to describe data sets that are too large for a traditional approach to database management. You can think of it as a database (or a set of databases) too big or diverse for the server it sits on to function in a meaningful way. Big Data solutions therefore are technical solutions to the problem of analysing and processing data in these situations.

By definition then, the management of Big Data is a complex issue. There are no simple solutions or one-size-fits-all approach that means you’ve “done” Big Data. And this therefore is where in the technology hype cycle the inflated expectations hit the cold reality of actually doing something meaningful with the data you have.

Examples of Big Data

In the real world, what are the applications of Big Data? Complex data sets requiring large scale processing include meteorological data, genomics and things like Internet search data. Successfully analysing Google’s data for trends in real time is beyond the capability of current relational database systems and so new technologies to organise data are required. Vast amounts of data in logs etc are being generated all the time, and this is growing exponentially beyond the capacity to actually deal with it.

Business Intelligence vs Big Data

Part of the problem with the term Big Data is a conflation with Business Intelligence. For those without a real understanding of what is meant by Big Data (or with something to sell), it appears to be used interchangeably with Business Intelligence. However Business Intelligence uses descriptive statistics (i.e. identifying the main features of the data being analysed) to measure things, detect trends etc; whereas Big Data uses inductive statistical techniques to  infer laws (regressions, nonlinear relationships, and causal effects) from large data sets to reveal relationships, dependencies, and to perform predictions of outcomes and behaviours.

The challenge for BI vendors then is to take their software beyond the analysis and absolute number crunching of traditional databases into trends and patterns seen across a number of disparate and high volume sources. It is this challenge, while not insurmountable and indeed will provide rich and valuable information for business users, that will lead to Big Data being caught in the hype cycle’s Trough of Disillusionment in the short term as the reality of meeting the challenge becomes apparent.




Leave a Reply

Your email address will not be published. Required fields are marked *