Google today announced a big update to BigQuery, its service for quickly analyzing large amounts of data, adding new features that let users handle large result sets, use window functions for advanced analytics, and cache large query results. Furthermore, the company has also slashed monthly data storage costs from $0.12/GB to $0.08/GB (a 33 percent reduction); the new pricing change will go into effect on July 1.
Google says customers with higher-volume usage will soon be able to opt in for tiered query pricing, providing “more economical and predictable cost interactive queries.” Unfortunately, it wouldn’t detail these: you’ll have to contact a sales representative for a quote.
As for the new features, here’s a quick rundown:
- Large results: Run queries that return arbitrarily large numbers of rows and save them as a new table for follow-up analysis.
- Window functions: Take advantage of built-in functions like Rank and Partition to create sophisticated statistical analyses with far simpler SQL than before.
- Query caching: Now recent queries that are re-run return a cached result when the underlying table is unchanged, providing more cost-effective analysis.
While BigQuery can process terabytes of information, previously it could only output up to 128MB of compressed data per query. Large results means that, as of today, the service can output results as large as the largest tables Google’s customers have ever had. To take advantage, just enable the new “–allow_large_results” flag or check for the setting under the new “Enable Options” menu when issuing a query job, and specify a destination table. Interactive query quotas for all users have also been doubled from 200GB plus one concurrent query to 400GB of concurrent queries plus two additional queries of unlimited size.
Next up, Google has added support for window functions (also known as “analytical functions”) for ranking, percentiles, and relative row navigation. In short, these new functions offer different ways to rank results, explore distributions and percentiles, as well as traverse results without the need for a self join. Check out the window functions documentation for descriptions and examples of each new addition.
Lastly, query caching means BigQuery now remembers values that you’ve previously computed, saving both the time and cost of recalculating a query, for 24 hours (on a best-effort basis). Queries are also cached on a per-user basis for the sake of privacy and cached results are only returned for tables that haven’t changed since the last query, or for queries that are not dependent on non-deterministic parameters. Reading cached results is free, but each query still counts against your queries-per-day quota.
Finally, Google has also tweaked the BigQuery UI with instant information while writing a query depending on the validity of the syntax. If it is valid, the UI will inform you how much the query would cost to run, and if it’s not, you’ll be told where the error is. Also, when running queries on the UI, previously you had to wait until its completion before starting another one, but now you have the option to abandon it and start working on the next iteration immediately.
This release is arguably bigger than the last BigQuery update released in March. The pricing changes alone are notable enough to put it over the top.
Top Image Credit: Pawel Kryj