Big data has taken the business world by storm. Data is growing faster than ever before. It’s estimated that by the year 2020 each human will create 1.7 megabytes of new information every second. A review of the Google Trends chart for “big data” reveals, too, that interest in the big data movement has skyrocketed over the past couple of years.
Not surprisingly, business leaders have been gung-ho to jump on the big data bandwagon. According to a 2014 survey conducted by Gartner, 73% of organizations had already invested in big data or had planned to invest by 2016. Unfortunately, despite the enormous potential of big data, many organizations are finding big data to be an Achilles heel of sorts. In particular, organizations are crippled by three primary impediments:
- Poor decision making:
According to the 2015 Experian Data Quality benchmark report, U.S. companies allege that 32% of their data is inaccurate. What’s more, there’s no sign of recourse – the 32% inaccuracy estimate represented a 28% increase over 2014 estimates. When any underlying data used to make decisions is inaccurate and/or unreliable, decision making will necessarily be flawed. Ill-informed decisions have great potential to wreak havoc on an organization’s bottom line.
A recent study by IBM estimated that $3.1T of U.S. GDP is lost each year as a result of bad data. We also know that even when data is accurate, big data can cause ill-informed business decisions and outcomes. It’s tempting, for instance, to incorrectly conclude that correlation equates to causation. When we’re too reliant on big data, we risk using it “like a drunk uses a lamppost” – for support, rather than illumination and elucidation. Too often, our decisions reflect data and assumptions that have not been scrutinized and challenged. Don’t blindly trust the data. Blind trust almost certainly results in misuse.
- Wasted time:
Despite the role of the data scientist being dubbed “the sexiest job of 2016,” the reality is not turning out so peachy. Data scientists spend an estimated 60% of their time cleaning and organizing data. The common practice of cleaning and organizing data has led to the coinage of a new term: “data munging,” which, according to renowned data scientist Mike Driscoll, describes, “the painful process of cleaning, parsing, and proofing one’s data.” If the majority of data scientists’ time is being spent cleaning data rather than analyzing and making sense of it, significant and costly time is wasted. To add insult to injury, “dirty data” often results in decision makers questioning their decisions and wasting even more time fretting about whether the “right” decisions have been made. To the extent possible, organizations are responsible for enacting practices and policies to ensure data is normalized, bad records are removed, and incorrect records are cleaned. Save the analytics and the recommendations – the “sexy” activities – for the data scientists.
- Dampened morale:
Not only can big data lead to ill-informed decisions and wasted time, it also has great potential to negatively impact morale among data scientists. To add insult to injury, it’s estimated that 57% of data scientists perceive the task of cleaning and organizing data to be the least enjoyable component of their work. Ill-impaired morale is especially disastrous in the context of data scientists. Data scientists tend to be highly educated and skillful. There’s a far-reaching perceived discrepancy between what these individuals perceive their job description should entail and what it actually does entail. These top-of-the-class students didn’t sign up to be janitors cleaning data. They have a right to feel disgruntled by the reality they’ve been presented with.
As data scientist Cathy O’Neil substantiates in her recent book, Weapons of Math Destruction, the results of big data can be toxic and far-reaching. Far too often, big data is the Achilles heel of an organization. That weakness (despite the overall strength) that can potentially lead to downfall. Fortunately, the impediments described above can be largely avoided. It’s high time organizations adhere to proper data hygiene practices, resist the temptation to blindly follow data and assume correlation equates to causation, and empower data scientists to perform the functions they enlisted in.