Yesterday we caught the columnist David Brooks' piece on Big Data.  You can find it here.  He is right to point out that efforts by Big Data proponents to break the long-standing axiom that "correlation does not equal causality" in data science is misguided.  

correlation.png

This phrase "Correlation does not equal causality" was never meant to downplay or white wash any of the analytical techniques that can be misused to establish causality.  Rather it was meant to make sure data scientists added the narrative to the statistics. 

The narrative is the explanation of why the correlation appears.  20 years ago diapers and beer being bought at the same time in the convenience store was more than a correlation.  The narrative that went with it - young fathers sent out into the dark cold night to fetch might as well as well pick up some new family stress relief along the way - launched an entire new field of Information Technology called Business Intelligence.  Without the narrative its just statistics, and might well be a spurious correlation and little understanding in it.  The beer and diapers would have stayed in separate isles. 

Policy makers in government change course when they hear a story backed up by numbers.  Business leaders reallocate resources when they hear a story backed up by numbers.  If you are part of Big Data do the numbers, build the machine learning algorithm, but tell the story also.  Else your efforts will be far less impactful than they should be.  

4 Comments