Harrison Weber is TNW's Features Editor in NYC. Part writer, part designer. Stay in touch: Twitter @harrisonweber, Google+ and Email. Harrison Weber is TNW's Features Editor in NYC. Part writer, part designer. Stay in touch: Twitter @harrisonweber, Google+ and Email.
Software engineer Eric Sun has revealed the evolution of Facebook’s Entity Graph, a complicated data set which maps the social network’s 100+ billion connections between people, places and interests.
Writing on Facebook’s blog, Sun detailed how the Entity Graph became the backbone for the entire Social Graph, and the technology behind Graph Search. There’s a lot to cover, so let’s dive in:
Facebook’s Entities team was first founded with the task of transforming the plain text descriptions in user’s profiles into structured data. Put simply, the team worked to add meaning to this text, the importance of which cannot be understated: without being able to process its user’s entries, Facebook would have no way to do anything with your data; no ad targeting, no business model and no helping you stalk your old friends from high school.
In order to take advantage of all of those juicy details in your profile, Sun said his team had to find a data set to represent a seemingly unlimited number of interests. Their solution was to tap into Wikipedia, which powered Facebook’s creation of “millions of ‘fallback’ pages.” Facebook heavily relies on Wikipedia to this day.
These fallback pages were matched to interests that couldn’t be connected to pre-existing pages. Afterwards, they were manually vetted for duplicates; ones which didn’t receive any connections were deleted. According to Sun, many other quality control challenges came into play, including the handling of film remakes, “shortened forms of book names,” as well as “ASCII art and obscenities.”
With machine-readable data in place, the team went through a number of steps to match similar concepts through natural language processing via the WordNet database. For example, Sun says, “when a user types in ‘my friends who are skiers,’ we are then able to match this to everyone who likes the Skiing page.”
Facebook’s Entity Graph Today
According to Sun, the Entity Graph is now growing at a greater rate than the Facebook team can even inspect. Now, Sun says, the team is focusing its time on “building multi-pronged systems that are scalable and that will improve the graph over time.”
As you may already know, Facebook is now able to programmatically dive quite deeply into the data you share. For example, if you happen to like the song “Help,” Facebook knows the song is by The Beatles and it also knows which of your friends like the Beatles. Or, if you visit a restaurant, Facebook knows where it is, who else goes there, where its visitors live, and which other restaurants its visitors like.
The above evolution is what allowed this “mapping” process to occur. Of course, that grossly over-simplifies the technical challenges involved, but still, it reveals how Facebook is able to understand you and your place in the world. Unsurprisingly, it does so to an astounding — and somewhat disturbing — degree.
Get the TNW newsletter
Get the most important tech news in your inbox each week.