20 years ago no one in astronomy thought detection of other planets was possible or if they even existed outside our solar system (mostly because as a scientist you cannot conclude something you cannot observe). But along came better, more powerful telescopes of a wide variety. Then smart scientists coupled novel analytical techniques to the increased amount of data collected by the telescopes. Namely, they looked for the faint gravitational “wobble” a planet would put on the light we can observe from distance stars as they pass in front of their sun. Brilliant.
To be sure, we are not observing the planet itself, even though they were there all the time. The only thing we are actually observing is the data exhaust of a planet-sun interaction system to infer the planet exists.
Big data is like that. We improved data collectors in our more immediate world. These come in the form of log files, cell phone location data, click streams, and embedded machines sensors everywhere you look. We also made a good deal of that data interoperable with APIs (Application Programming Interfaces) so we could collect and meld ever-large sets of data. To this is added ever more sophisticated machine learning techniques to trick more understanding out of the collected data. Just like the planet hunters.
Daring an over-simplification lets see the difference between the old data way of modeling and predicting systems and the new big data way, that again is not unlike planet hunters from 20 years ago. Predictive analytics of old relied on collecting as many observable inputs and outputs to a system to model it. The trouble was always the lack of data because no one bothered to collect it or it was proprietary and so protected from prying eyes. It worked but wasn’t always pretty and limiting. Just like the planet hunters who lacked data or were frustrated by being earth-bound and having observations obscured by the atmosphere.
But along the way someone noticed the increasing amount of observable “exhaust” from systems and wondered if there was enough efficacy in the data to conclude something useful. Sure enough there was. And a corner was turned.
Exhaust from a system generally can’t be hidden. It is free for all to see and use and understand the inner workings of a system. And once understood better policies, stronger competition, and faster innovation are the result. This is the benefit of Big Data Analysis writ large. Likewise, someday soon, the planet hunters will devise and detect one of those planets they found blinking a signal that hopefully says, “Welcome – lets talk!”