At a Big Data conference last summer I sat puzzled when listening to a speech about collecting and processing expense spreadsheets. "How is this Big Data practice?", I wondered. "Looks and smells like more traditional Business Intelligence to me", I thought. There was no analysis, only reporting facts from a very small set of data relative to all the data floating around that large enterprise.
It's bothered me every since. We throw around the words 'Big Data' and slap it onto old practices to make them seem new and exciting. This waters down Big Data and confuses those who are new to it.
But there is a proper way to understand the roles older Business Intelligence, statistical modeling, data warehousing efforts, etc. and true Big Data practices fit together.
They fit together because Business Intelligence (and it’s cousins) is deductive or hypothesis based while Big Data is inductive or observational and each informs the other. Here is how.
Deductive models work like the diagram below. Start with a hypothesis about how a system, market, consumer or patient acts. Then collect data to represent the stimulus. Traditionally, the amount of data collected was small since it rarely already existed, had to be generated with surveys, or perhaps imputed through analogies. Finally statistical methods established enough causality to arrive at enough truth to represent the system. So deductive models are forward running – they end up representing a system heretofore not observed.
Inductive models work the opposite way. They start by observing a system already in place and one that is putting out data as a by-product of its operation. Like exhaust from a car tailpipe, many modern systems are digitally based and thus put out “data exhaust”.
And thus Big Data is called Big since the collection of exhaust can be huge. Your cell phone alone broadcasts application usage, physical movement, URL consumption, talk time, geographic location, speed, distance, etc. It is the same with your computer, your car and soon your home.
With inductive models you arrive at some level of system understanding but not truth since the data you have collected is the outcome of the system not the input. On the other hand the deductive model is never the truth since it suffers from small and incomplete data.
Now put these two together to arrive at a virtuous support of one another. Inductive models reveal possible dynamics in a system because of the size of the data and because that data is output based. This can be feed back to the deductive system so it may be improved and thus getting closer to truth. Likewise the improved deductive model points the way for new types of data to look for in and use the inductive model. This is a virtuous cycle to be exploited.
This is often a disagreement in Big Data circles about one type of model over the other, or how Big Data spoils the decades of work of statisticians. This need not be the case. Both models can and should live to together. They are both better for it.