How Facebook plans to keep your data private while opening it up to researchers

How Facebook plans to keep your data private while opening it up to researchers
Credit: Facebook

Ever since the 2016 US presidential elections, social network Facebook has been at the receiving end of criticism for its outsized influence on worldwide electoral processes, and threats to democracy from the spread of fake news.

Now more than a year after Facebook announced a new initiative to aid independent research analyzing the platform’s influence on elections, the company has made good on its promise by opening up its data for the first time to more than 60 researchers from 30 academic institutions across 11 countries.

To that effect, it has outlined a differential privacy-based approach that it expects will protect users’ confidentiality while simultaneously allowing researchers to come up with meaningful analyses that can offer valuable insights into solving the problem.

“We hope this initiative will deepen public understanding of the role social media has on elections and democracy and help Facebook and other companies improve their products and practices,” said Facebook in a blog post announcing the move.

The selected researchers, who have been chosen by the US Social Science Research Council, will have access to a database of URLs its users have linked between January 1, 2017 to February 19, 2019.

Note that the time frame entirely skips the months leading up to the 2016 US election, during when Russian actors misused Facebook to target millions of voters with ad campaigns covering various divisive topics including LGBT rights, Black Lives Matter, immigration, Islam, among others. Nor does it cover the UK’s 2016 Brexit vote.

In addition to the aforementioned database, Facebook has also made available an Ad Library API that throws more light on political ads on the social network in the US, UK, Brazil, India, Ukraine, Israel and the EU.

Facebook’s social science initiative was initially launched last April, but the rollout hit a snag in the wake of Cambridge Analytica data scandal, after it became known that a political consultancy firm working for the Trump campaign had improperly accessed the personal information of about 87 million users harvested via a third party research app designed by a University of Cambridge professor.

Subsequently Facebook’s data sharing partnerships attracted scrutiny, leading the tech behemoth to clamp down on third party app developers’ access to user information last week.

So how is it different this time around? Well for a start, it has developed a data sharing infrastructure that has been conceived with an aim to “provide researchers access to Facebook data in a secure manner that protects people’s privacy.”

One such step involves removing personally identifiable information from the data set and only allowing researchers access to the data set through a secure portal that makes use of two-factor authentication and a private network.

Another approach is differential privacy. The technique makes it possible for tech companies to collect and share aggregate information about users, while safeguarding the privacy of individual users. Johns Hopkins University professor Matthew Green explains differential privacy as follows:

Imagine you have two otherwise identical databases, one with your information in it, and one without it. Differential Privacy ensures that the probability that a statistical query will produce a given result is (nearly) the same whether it’s conducted on the first or second database.

Facebook hopes to employ this technique to build the URL database, including links that have been shared on the social network by at least 100 unique users with their privacy settings set to “public.”

Let’s, for example, assume Bob has shared a URL that was shared by 1,499 other users on Facebook, and the researchers viewing this data want to ascertain if Bob had indeed shared the URL. Let’s also assume, for simplicity sake, they are tracking a total of 2,000 users for this purpose.

If they have already obtained background information about the 1,999 others, and determined that 1,499 people have shared that special URL and 500 people haven’t, then they can deduce that Bob is the 1,500th person to share that URL.

But differential privacy protects users against such re-identification attacks by adding random noise to the aggregate data set, such that it will “prevent researchers from reidentifying individuals while simultaneously not obscuring research findings about societal patterns when researchers perform appropriate analyses.”

Thus using differential privacy, you may say the number of people who have shared a specific URL is 1,502 or 1,490, as opposed to quoting the exact number of 1,500. This way, the inaccurate number can help preserve Bob’s privacy. Apple already employs this method to gather anonymized usage statistics from its iPhone, iPad and Mac users.

With elections in India already underway and the European Parliament elections set to happen in a month’s time, Facebook’s role will doubtless be examined more closely. The opening up of its data vault in a privacy-preserving manner is a much-needed attempt at course correction and could pave the way for a more accountable, trustworthy platform in the long run.

TNW Conference 2019 is coming! Check out our glorious new location, inspiring line-up of speakers and activities, and how to be a part of this annual tech extravaganza by clicking here.

Read next: Tumblr’s Head of Culture and Fashion shares the platform's most curious subcultures