This article was published on January 24, 2013

Behind the scenes, Facebook fights spam with a new in-house programming language


Behind the scenes, Facebook fights spam with a new in-house programming language

Facebook on Thursday detailed a programming language designed by the company’s engineers to fight spam. Feature eXtraction Language (FXL) is a domain-specific language “forged in the fires of spam fighting at Facebook” to quash abuse before it spreads too quickly and affects more users.

Facebook says FXL is the company’s answer to the question for a fast, flexible, safe way to write rules for identifying spam. This isn’t the same as fighting junk mail: the company notes spam threats on the social network can change on a daily, or even hourly, basis.

This is where FXL comes in. The programming language lets Facebook spam fighters keep up with constantly evolving threats because it “is simple and easy to write” (translation: it’s designed specifically for spam fighting) and “extremely efficient for Facebook-sized workloads” (it’s designed specifically for the social network).

So is FXL really a completely new programming language? Yes and no. It’s not exactly built from the ground up, but it is very heavily customized: the company describes it as “a narrowly-optimized implementation of a well-chosen subset of Standard ML (with some customized syntax).”

Here’s some example code of spam fighting rules to catch dangerous URLs:

If (Reputation(SharedUrl) < 0) Then [LogRequest] Else [] If (Reputation(SharedUrl) == MALWARE) Then [BlockAction, LogRequest] Else [] If (Average(Map(Reputation, PreviousSharedUrls(User, 5))) < 0) Then [WarnUser, LogRequest] Else []

Facebook explains what these examples do:

These rules retrieve the user’s URL sharing history and fetch data from a URL reputation service. While they coherently express business logic for detecting spam, these rules are poor expressions of the optimal data fetching logic. A conventional implementation would evaluate this code top to bottom, left to right. We would fetch data sequentially, conducting an excessive number of network round trips between the machine executing FXL and the reputation service. This is a classic problem of large computer systems: naively mixing business logic with data fetching logic, resulting in pathologically bad performance. A more sophisticated approach would find a way to batch these data fetches in a single network round trip. FXL was designed to do precisely this and automate these data fetches.

Sometimes it’s difficult to remember that Facebook spends massive resources on fighting spam because of the small fraction that gets through. Yet you have to remember the company runs the world’s most popular social network with over 1 billion monthly active users, and it’s really a wonder that more spam doesn’t get through.

FXL is one of the reasons for the success behind the company’s ability to keep spam levels under control. For more technical details, read more about it here.

Image credit: Gabor Heja

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with