An October 3rd blog post in Harvard Business Review - found here - provides sound advice on succeeding in Big Data by starting small.  Don't over shoot, try all the steps needed in a Big Data project at a small scale before making the full investment in going Big Data scale.  All safe and correct as in a traditional IT project.

But there are several missing ideas here that are unique to Big Data.  One is the nature of the data itself.  The nature of Big Data is .... well Big.   Google Trends is a fascinating Big Data project.  It comes out of Google's never ending collection of search query strings.  But had Google tried to start small on the presentation of their data none of it would have made sense.  They had to wait until a sufficient amount of data piled up.  Only then did Google realize they had something interesting to compute and show to anyone.  

For example the chart included here shows the interest level of the iPhone and Android over the last 6 years.  iPhone is the blue line and Android the red.  The various peaks in the iPhone search index are, as you might surmise, around the release of successive versions of the phone.  


Analytically the story of the level of interest and the competition between iPhone and Android doesn't become clear until many data points have been collected and enough time has passed to see the trend.  

The literature on machine learning also suggests that more data is better.  In fact in a study on the efficacy of various algorithms this 2011 paper shows that an "ensemble" use of 4 or more learning algorithms begins to approach a 90% human accuracy threshold.  This shows the power of not starting small but going big with data as well as going big with the analytical approaches to surround it produce human-like comprehension.  And after all what confidence would we have in Big Data unless it was telling us something we would conclude on our own if we had the time to absorb it all.