Get the latest tech news

The missing tier for query compilers


why compile queries? Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.

Instead some OLTP databases (eg sqlserver, singlestore, redshift, oracle) compile queries to native code where those parameters can be hard-coded, allowing the values in each row to be kept in individual cpu registers throughout an entire pipeline. Their recent paper reports that the tradeoff curve is really sharp around their baseline compiler (DirectEmit) and that their llvm backend is only the best choice for long-running queries: This kind of comparison is always fraught because it's hard to know what to include without being familiar with a codebase, and we're comparing projects with different maturity and scope (in particular datafusion/velox probably support a lot more sql functions than umbra, which is why I tried to separate out those counts).

Get the Android app

Or read this on Hacker News

Read more on:

Photo of query compilers

query compilers

Photo of missing tier

missing tier