A lot of conversations I'm in having these days ask about these two phrases: Have I done it? Can I lead a team doing it? To answer I've had to put some stakes in the ground and define them from my point of view.
- Big Data: a state in which current systems and capacities are simply overwhelmed. One cannot use traditional thinking or tools because the data doesn't fit in memory on a single machine.
- Data Science: the process of interrogating data in hopes of improving the human condition.
Because we're looking at the world passing by as a torrential stream of bits we need to have a goal, an objective or a problem to solve. One simply doesn't just jump in, there needs to be a plan and a lot of preparation (did I mention a LOT of preparation) grounded in experience, math and statistics.
Big is in the eye of the beholder.
Having worked with US and Canadian clients there is a line in the sand where things seem big. For example a reasonably sized loyalty program for a national US retailer is considered big by Canadian standards since it is larger than the total population. Frame of reference matters.
Science is a pursuit, a line of reasoning not an algorithm.
Along the path we need to visualize, explain and communicate what we've learned to date. Sometimes it is enough to know that a tactical change improves conversion because of correlation; other times we need to explain why and address causality.
Big Data is not Data Science and Data Science is not Big Data although it is quite clear the two overlap and the most frequently mentioned stories come out of that intersection.
|Congo: The Grand Inga Project|