Google open-sources cryptographic tool to keep data sets private

Google open-sources cryptographic tool to keep data sets private

Poorly secured databases are a top privacy and security concern — and Google now wants to plug that leak.

The internet giant has said it’s open sourcing Private Join and Compute, a new secure multi-party computation (MPC) tool designed to help organizations work together with confidential data sets.

The tool is conceived with privacy in mind, and thus allows organizations to trade data sets and glean aggregate insights about other parties’ confidential data without actually disclosing anything about individuals represented in the data set. The data stays encrypted, and only the results of calculations based on the data will be revealed.

It works by using a cryptographic protocol called Private Set-Intersection (PSI). The company already employs this approach on its Password Checkup Chrome extension which lets users match their login credentials against an encrypted data set of 4 billion compromised credentials without revealing the details to anyone, including Google.

Most data sets today have fields like email addresses and phone numbers that can be used to uniquely identify each record. In PSI, these identifiers and associated data are encrypted with private keys. This ensures the data is not decipherable to any other third party.

The organizations can then exchange this encrypted data with each other, followed by encrypting the identifiers a second time with their respective private keys. This double-encrypted data is traded again, and then joined with the other party’s double-encrypted data set to discover intersections among the two data sets.

As an example, Google describes a scenario in which a city wants to know whether the cost of operating weekend train service results in increased revenues at local businesses.

By processing the city’s rider data set and the point-of-sale data set from merchants using Private Join and Compute, it allows the city to determine the total number of train riders who made a purchase at a local store without revealing any identifying information about the riders or their purchases.

“Using this cryptographic protocol, two parties can encrypt their identifiers and associated data, and then join them,” said Google in an announcement on Wednesday. “They can then do certain types of calculations on the overlapping set of data to draw useful information from both datasets in aggregate. All inputs (identifiers and their associated data) remain fully encrypted and unreadable throughout the process.”

Once the intersecting data set has been identified, calculations — like count, summation, or average — can be performed on it to reveal aggregate statistics.

But the underlying data remains concealed using a process called homomorphic encryption, which enables certain types of computation to be performed directly on encrypted data without having to decrypt it in the first place.

The technology is doubtless promising, given the number of data breaches, and other security incidents involving third-party handling of data. PSI can therefore be a privacy preserving option to perform various kinds of data analytics, including tracking ad campaign effectiveness, which is crucial to Google.

Private Join and Compute’s official announcement comes close on the heels of TensorFlow Privacy, a library for Google’s TensorFlow machine learning framework that leverages Differential Privacy to make it easier to train AI models with strong privacy guarantees.

You can read more about the research and methodology behind Private Join and Compute here and here.

Read next: Studying naked mole-rats could be the key to breakthroughs in treating pain and cancer