This article was published on November 6, 2013

Facebook open-sources Presto, a homegrown SQL query engine for mining its enormous data warehouses


Facebook open-sources Presto, a homegrown SQL query engine for mining its enormous data warehouses

Facebook is open-sourcing Presto, an SQL query engine that it developed in-house to help analysts, data scientists and engineers pick apart the information stored in its enormous data warehouses.

Development for Presto began in the fall of 2012 and was then released to all Facebook employees last spring. The system is now used by over 1,000 employees, running over 30,000 queries that include at least one petabyte of data on a daily basis. Facebook says it’s “ten times better” than alternatives such as Hive and Mad*Reduce in regards to CPU efficiency and latency for the majority of queries submitted by its employees.

“It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries, and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest),” Martin Traverso, a software engineer at Facebook said.

You can check out the code and documentation for Presto using the links below:

The <3 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

Documentation | Code

Image Search: Ed Jones/AFP/GettyImages

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with