Data restrictions don’t have to spell disaster for companies

The Facebook-Cambridge Analytica data scandal was a disaster for everyone: for Facebook’s users, whose personal data was exposed and misused. For Facebook itself, which breached and then lost the trust of a billion-plus people. And lastly for the thousands of third party app and game developers, marketers, and researchers who relied on Facebook for access to its data and its users, and have now seen that source run dry as Facebook pulls back its information.

When it comes to utilizing data, Cambridge Analytica is not an anomaly; Facebook estimates that the majority of its users have had their data accessed by third party apps, and any time a person downloads an app, their permission is requested for access to specific data.

What is an anomaly is the way Cambridge Analytica handled (or mishandled, as the case may be) the data it received, taking it one step further and using it to access Facebook users’ friends and selling their data to political consultants. As a result, Facebook became much stricter with the data it was making available.

But while tightening the reins may seem like a good thing, it leaves the companies that relied on Facebook for data in a conundrum, needing to find a new avenue in order to keep their revenue stream flowing. So where can companies turn when Facebook turns off the faucet? One solution is to go straight to the source, aggregating data directly from across the web.

Filling the hole left by Facebook

The fact that companies collect consumer data for marketing purposes is nothing new. Even before the Internet, advertisers targeted consumers based on their communities, their spending habits, and by purchasing mailing lists from other companies.

Think of the number of catalogs and coupons you used to receive in the mail without having subscribed to them.

The internet digitized that process through everything from online newsletters to shopping to subscription web streaming.

And Facebook, with its pages, groups, events, and likes by its billion users—who also willingly gave away information such as date of birth, alma mater, and relationship status—made that data collection even easier by putting it all in one place.

That ease of information is why returning to the roots of data collection may seem like a daunting task for companies that have come to rely on Facebook. But much of the information that exists on Facebook exists elsewhere—you just need to know how to look for it.

Before Facebook, companies were using data aggregators to collect information from across the web. They were listening in on message boards, which covered everything from finance to fertility, to hear brand opinions, lifestyle suggestions, and the always popular humble brags.

They were scanning the comments on blog posts news feeds, review, and e-commerce sites, and they were converting that content into machine-readable information. It was anonymous content, but it was just as usable, and those message boards and blogs still exist; they didn’t disappear with the advent of Facebook.

If anything, as users become more wary of Facebook’s disregard for privacy, those avenues—which provide more anonymity—will thrive more than ever.

The benefits of roots-based data mining

Returning to the roots of data collection is a change companies are going to have to get used to, but in the end it may prove to be a stronger model than depending solely on Facebook.

For one thing, there is an inherent danger in constructing a business model so heavily reliant on the success and cooperation of another entity. It’s exactly what these companies did with Facebook, and it’s exactly what’s failing them now.

But this isn’t the only reason going to the source is better. The lack of anonymity and the tendency for people to congregate with like-minded people means that the data being collected on Facebook wasn’t necessarily accurate. Instead, it presented a small sliver of the populations’ opinions, ones they were willing to express out loud within their own echo chambers.

Message boards and e-commerce sites are not only automatically sorted by interest, they also give people a safe space to express their opinion without the judgment of their friends, and they include people from a broad segment of the population. Similarly, data from e-commerce sites presents a more accurate picture of consumer habits than Facebook posts and likes.

More importantly for companies looking to avoid a PR nightmare, aggregating anonymous content from across wide swaths of the web presents a more ethical form of data collection than doing so from Facebook.

There’s no question that data collection and analysis is about to go through a significant upheaval, and companies will need time to adjust. But I predict that as companies begin to return to their data “roots” and to the people themselves, they’ll find that the depth of information they get is worth the revolution.

Story by Ran Geva

CEO, Webhose -- Ran is an expert in web data collection and structuring, as well as big data solutions. He's also a serial entrepreneur with (show all) CEO, Webhose -- Ran is an expert in web data collection and structuring, as well as big data solutions. He's also a serial entrepreneur with over two decades of hands-on experience in software development and leadership positions. After previously co-founding one of the leading web monitoring and analysis companies in Israel, he went on to co-found Webhose.io -- a leading global provider of structured web data – where he serves as CEO and lead technologist.