“The more data that you have, the better the model that you will be able to build. Sure, the algorithm is important, but whoever has the most data will win.”—Gil Elbaz, Factual

“With any Big Data project, if you don’t spend time thinking about analysis, you’re wasting your money. You must have a structured idea of what you want to get out of unstructured data”—Ron Gill, NetSuite

So lets see if I understand.  With Big Data any lack of precision can be beat by adding more data to the algo.  Pour more into the sausage grinder and you get better sausage.  Simple right?

But wait.  Our second quote tells us to carefully plan and understand the nature of Big Data so you know what algo to apply to it.  Or all is lost.  

So here are two Big Data ideas that oppose each other.  In reality you need a bit of both.  Or as the the saying goes this is a case of each idea being necessary but not sufficient on its own.  But take them together and you do have the makings of good Big Data practice.

In any project, many iterations are needed to discover the inflection point where more data generates a diminishing return in terms of a quality algorithmic outcome.  You must seek it, no matter how large the data as input turns out to be, since to do otherwise sacrifices accuracy.  

But likewise, not having a theoretic underpinning - that structured plan to handle the unstructured data - is a fools errand.  Too many Big Data projects rely on simply adding more data to goose the algo's precision.  Spurious is the term typically used when you can observe high correlation values but not explain them logically.  Without this "story-telling" understanding of the algorithm you are on shaky ground.  

Use both these ideas as good practice for good Big Data - everyday.  

Comment