Comment

BI or Big Data? - Start by thinking backwards

In business there seems to be a debate on where BI ends and Big Data begins.  Here is an easy way to figure it out.  Start by thinking backwards.  This means think first about the decision that needs to be made, the nature of it, the data that may exist to support it, where the data is, and how you might analyze that data in order to arrive at an answer.  Looking at these attributes will help you understand where BI ends and Big Data begins.  

Transient

Is the decision to be made new?  Not always true but generally BI is about updating old data, models and answers or perhaps taking a small tangent on old data and models to seek new answers.  Big Data is more about solving something that eluded previous analysis. 

Lot of BI is inwardly focused - running the trucks on time and so forth.  Big Data looks to focus on the outward.  Things like regulation, competition, advances in science & technology and so on.  Which leads to the 3rd question to ask - does data already exist that you can feed into your analysis?  For BI the data is mostly lying around inside the company.  For Big Data the data is the fire hose of social media, lots of news feeds and / or a collection of API feeds you have never looked at before.

An intuitive understanding of what to do to the data once you have it is another hallmark of BI vs Big Data.  BI is about traditional well understood analysis.  Big Data is about machine learning - discovering what the underlying patterns, connections and models are without deciding that ahead of time.

The biggest bugaboo is whether the data is structured of unstructured.  As a general rule for BI this means the data is numerical where the unstructured stuff is messy text - written information that conveys essential ideas in a million different ways.

Answer the questions in the chart and you'll know how to proceed.  

Comment

Comment

Starting Small with Big Data

An October 3rd blog post in Harvard Business Review - found here - provides sound advice on succeeding in Big Data by starting small.  Don't over shoot, try all the steps needed in a Big Data project at a small scale before making the full investment in going Big Data scale.  All safe and correct as in a traditional IT project.

But there are several missing ideas here that are unique to Big Data.  One is the nature of the data itself.  The nature of Big Data is .... well Big.   Google Trends is a fascinating Big Data project.  It comes out of Google's never ending collection of search query strings.  But had Google tried to start small on the presentation of their data none of it would have made sense.  They had to wait until a sufficient amount of data piled up.  Only then did Google realize they had something interesting to compute and show to anyone.  

For example the chart included here shows the interest level of the iPhone and Android over the last 6 years.  iPhone is the blue line and Android the red.  The various peaks in the iPhone search index are, as you might surmise, around the release of successive versions of the phone.  

Transient

Analytically the story of the level of interest and the competition between iPhone and Android doesn't become clear until many data points have been collected and enough time has passed to see the trend.  

The literature on machine learning also suggests that more data is better.  In fact in a study on the efficacy of various algorithms this 2011 paper shows that an "ensemble" use of 4 or more learning algorithms begins to approach a 90% human accuracy threshold.  This shows the power of not starting small but going big with data as well as going big with the analytical approaches to surround it produce human-like comprehension.  And after all what confidence would we have in Big Data unless it was telling us something we would conclude on our own if we had the time to absorb it all.  

Comment

Comment

What Big Data Looks Like

Many so-called experts have trouble telling you what Big Data is.  There has yet to emerge a standard definition.  Some try to describe it as as being three-dimensional as in 3Vs - meaning data that has a Volume so large it cannot be processed by traditional database or processing tools, or the Velocity of the data (what is added or subtracted from a dataset) that again traditional tools could not keep up, and the final V being Variety of data that traditional systems have trouble handling.  More recently analysts have added a fourth V for Virtual - to distinguish data that is online as opposed to data an organization already has captured or owns.  

No matter what definition you subscribe too we thought looking at Big Data in action - at least in a raw form) can bring to life these abstractions.  Take a look at the short video below of the real time tweets about iPhones on the left and Android on the right.  Then ask yourself if this flow of data meets the 3-4 Vs described above.  Sure looks like Big Data to us.

Then leave us a note on how you might guess we would capture and process this Big Data into something useful and impactful.  Since in the end, after watch the data flow by your eyes for 38 seconds you start to go cross-eyed.  Big Data Lens is here to prevent that. 

Comment

Comment

If Baboons can do it ....

Here is a fascinating study on how we acquire the first step in forming language.   The researchers could get Baboons to recognize real words from just non-sense strings of letters with 75% accuracy after roughly 10,000 attempts.  This is pattern recognition and is fundamental to taking the next step which is to associate words with meaning followed finally by the ability to distinguish one meaning over another given a certain context.  

And a here is a video of the baboons correctly making those choices over at National Geographic

it is this kind of clever patter recognition that forms the basis of our statistical natural language procession machine learning algorithms.  This approach is at the heart of Big Data Lens effort to make all the web a useable - understandable database.  

Comment

Comment

Big Data - old becomes new again (with some twists)

What we love so much about the Big Data effort are two things.  One is that what is old is useful again.  Two that there are some twists and progress along the way.

So what is old?  Some of the techniques are old but they still rock.  Regression is the most powerful analytical technique ever invented.  It forms the basis for all other modeling techniques including machine learning.  Why?  Because of the mathematical foundation on which it is built.  Namely - calculus and the great insight that levels or scalers or measurements don't matter.  Only the way they change matters.  Its the first derivative - remember that?  It's what happens AT THE MARGIN that is really important.  

You hear this everywhere. - in Economics, Finance, Science, Engineering.   What happens to the bridge's structural integrity if you add 1 more pound on top of it?  What happens to the efficacy of a new drug if you one more ml of an ingredient.  And so on.     

And what is new?  Taking the effective cool old stuff and applying it in ways and at a scale no ever imagined.  Huge quantities of input data are just a start.  Then processing that data with the cool old techniques up on self-expanding cloud services to get a job done that less than a generation ago took most of NASA to accomplish.  Now you do it from your desktop and a couple of interfaces.  And the coolest of all?  Letting the machine learn and show that it can do so by finding patterns and making fits of data that would have taken a modeler months of work, if ever.  

So welcome to Big Data.  Maybe it's geeky but they outcomes are cool and getting better all the time.  

Comment