Lots of studies look at the sources of Big Data.  The info-graphic from IBM shown here suggests that the majority of Big Data comes from 6 sources.  You might call these sources "the usual suspects".  But to our way of thinking what percentage of all the worlds information do these 6 sources represent?  Not much is the answer. Think 80/20.  80% of the worlds information is in a written (e.g. words) form.  None of the 6 sources listed here are words - well social media is but limited in its length.


So where is the rest of the worlds information and does it constitute Big Data?  It is the long form written news, blogs (like this one), journals, books and so on.  And yes that too is Big Data.  Or it can be once it is processed.  

That is where Statistical Natural Language Processing (sNLP) comes in.  Transforming the written word into machine understood data and viola you have Big Data too. Now mix and mosh it with the transactional, log data, and so as listed above and you've got a rich stew from which you can model and predict.  

Competitors actions - they're in the text.  Scientific discoveries - they're in the text.  Changes in regulation - they're in the text, not in log files and transactional data.  Include the written stuff by using sNLP and you open a whole new world of Big Data for your use.  

So if the above chart looks pretty limiting in terms of what you are going to get out of Big Data because what goes in is pretty limited think again and start generating your own Big Data - and profit by it.