A few notes that didn’t make it into my Fast Company post:
Technicians will be able to provide much more insight into the physical side of Big Data, from server configurations to cooling systems. As various databases are brought together inside an enterprise or across the Internet, Big Data has becoming the reigning moniker that represents data collections in the Exabyte, Zettabyte or Yottabyte range. Here are a few thoughts on what organizations need to consider before jumping into a Big Data project.
Defining the Data any database architect will tell you that the most important element in any database-oriented transaction or process is understanding the data. Although algorithms may be used to extract data, infer relationships and discover correlations, the very nature of Big Data and its diversity of sources will result in complexity that may reach beyond the human ability to audit those relationships in a meaningful way. It may not be until something appears to fail or point to a positive insight that the underlying data and models get examined. Many useful insights may be lost because of this, and many false positives chased. Organizations must make sure that they understand the quality, relationships, sources, completeness etc. of their data before they enter into Big Data experiments. (How many times has the Amazon recommendation engine changed over the course of your relationship with Amazon. This is a key strategic feature for them and they invest in it as an ongoing learning opportunity. If you aren’t willing to make that kind of commitment, you might want to think hard about how deep you go).
Understanding the Problem If the problems get too big, solutions sets may only make sense to a select few. Weather is a Big Data problem. Outside of that community, we know if the sun is up or not, but how our local weather person "predicts" our drive-time weather is a complete mystery. Even if the black boxes aren’t black boxes they are to those who didn’t construct the box. Those seeking solutions using Big Data need to understand the problems they are trying to solve, ensure that an adequate theory exists to model it (or if they have "a" theory that theory is well documented so everyone understands its assumptions and biases) and that the recipients of the analysis understand what they are getting, what it means and how it might be wrong.
The Skills Gap As with any fad, the Chicken Littles are raising their heads to shout about competitive incongruence. If you don’t do Big Data, Big Data will run over you. Companies are now out looking for Data Scientists or Quants to build teams to help them understand their data and offer sophisticated queries that provide insight, or models that anticipate future behavior, leading to a competitive advantage. There is current a big gap between the need for data scientists, and their availability. There is an even bigger gap between curriculum and the risks suggested by my Fast Company article Why Big Data Won’t Make You Smart, Rich, Or Pretty . Even if the world was able to fill the perceived need for data scientists, would they be humble enough, and ethical enough to recognize the potential failings of their chosen profession.
Keep in mind that if you tackle Big Data as a competitive reaction, you might not know enough to do it well, and therefore might create multiple risks: wasted investment in technology and analysis as well as the risk of making decisions based on information you don’t really understand.
(Note: as I post this, there is active chatter at twitter about how elitist the idea of a "Data Scientist" appears via my friend and colleague Merv Adrian using hashtag #Gartnerchat)