Facebook is open-sourcing Presto, an SQL query engine that it developed in-house to help analysts, data scientists and engineers pick apart the information stored in its enormous data warehouses.
Development for Presto began in the fall of 2012 and was then released to all Facebook employees last spring. The system is now used by over 1,000 employees, running over 30,000 queries that include at least one petabyte of data on a daily basis. Facebook says it’s “ten times better” than alternatives such as Hive and Mad*Reduce in regards to CPU efficiency and latency for the majority of queries submitted by its employees.
“It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries, and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest),” Martin Traverso, a software engineer at Facebook said.
You can check out the code and documentation for Presto using the links below:
Image Search: Ed Jones/AFP/GettyImages