Bundling data collection and data organization will ruin everything you love

Everyone knows the importance of a data-driven decision making in business and it’s increasing in popularity every year. However, what tends to get lost in that obsession is whether or not the underlying data is trustworthy.

It’s easy to say that inaccurate data leads to business problems. But what’s even worse is when a company runs on data that appears trustworthy, when in reality it’s flawed in ways that significantly impact critical decisions. I know the title is a tad dramatic, but people pour countless hours into trying to make their business successful, and nothing is more painful than realizing it was doomed from the start. That’s why we need eradicate the dangerous tendency of data providers to bundle data collection and data organization.

Data collection and organization are often treated as two sides of the same coin, yet many companies don’t realize that blending them leads to incorrectly or ambiguously labeled metrics. Since the separation of these processes is not yet common practice, the unfortunate reality is that most businesses, even those that identify as data-driven, are putting themselves at an irreversible disadvantage without realizing it — and destroying what they’re working so hard to accomplish.

Separating data collection and organization ensures that your company can obtain a clear, trustworthy picture of its data. It’s the best way to mitigate risk and optimize decision making.

The music industry mirror

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

There’s an analogy I like to make that compares data analysis to the music industry. In the 1960s, musicians would record all the instruments on a physical tape. To edit and create tracks they would then literally cut up the tape and paste it back together. If they messed up the cut-and-paste process, the tape would remain damaged forever — in other words, the recording (collection) of the sound was tied directly to the way it was organized.

Nowadays, nobody uses physical cutting and pasting to create complex, multi-layered musical tracks. Instead they use digital recordings which can be cut and recut and played with without ever losing the integrity of the raw original sounds. When it comes to data analysis, however, the industry status quo is like that of ’60s-era music.

Lumping together data collection and organization means it’s virtually impossible to retain the integrity of the original data if something needs to be rewritten or changed. And even the process of doing that rewriting requires hours of painstaking coding and labor.

For instance, let’s say your business begins by selling a single product (a mattress, for example) and your analytics are programmed to track each purchase, labeled as “purchase.” As the company grows you add a new product, a pillow. Your analytics, which have previously only been tracking “purchase,” can’t distinguish between a mattress purchase and a pillow purchase.

Your company might then reconfigure their system to track two events, “mattress purchase” and “pillow purchase,” but all of the old data is still labeled simply as “purchase.” Now they have to go back and manually update all the data points tagged “purchase” to “mattress purchase,” or just live with an inconsistency between historical and new data.

This is life when data collection and organization are lumped together — every new product addition requires the time-consuming, error-prone work of manually re-schematizing historical data. But by keeping the collection process separate from organization, you can easily change labels without losing any of the original data.

Suddenly it’s possible to simply create the new “mattress purchase” label and then retroactively apply it to all the historical data, no matter how it was previously labeled. This allows businesses to easily keep track of all information over time, as they grow from mattresses and pillows, to sheets, bedframes, comforters, and more (it also applies for non-bedroom related products).

Non-destructive editing and the data analytics future

Essentially, by separating the data analytics process the company in the above example can treat their data the way modern recording artists treat digital music — they can recut and rename the data however they want, without compromising the fundamental integrity. The music industry calls this “non-destructive” editing. Data analytics doesn’t even have a name for it because it’s such an uncommon practice.

Yet, non-destructive editing is eminently important for any data-driven business because ambiguously or incorrectly labeled metrics cause just as much trouble as wrong information. As companies grow and expand, their data becomes exponentially more complicated. By separating collection and organization, businesses gain the ability to easily and retroactively adapt their analysis. They no longer have to invest valuable resources to relabel data by hand, risking integrity and the utility of insights.

Ultimately, the future of the analytics industry depends on the decoupling of data collection and organization. It opens resources that simplify customer insights. And in the long run, it will allow businesses that dream of being truly data-driven the freedom to finally embrace their potential.

Story by Ravi Parikh

Ravi Parikh is the COO and co-founder of Heap. He and his co-founder Matin founded the company on the belief that for businesses to actually (show all) Ravi Parikh is the COO and co-founder of Heap. He and his co-founder Matin founded the company on the belief that for businesses to actually "be data-driven," the way we collect, organize, and access data needs to fundamentally change. He studied computer science at Stanford with a focus in AI. Prior to Heap, Ravi was a touring musician for a few years.