Get the latest tech news
Simple, Efficient, and Robust Hash Tables for Join Processing
Simple, Efficient, and Robust Hash Tables for Join Processing Hash tables are probably the most versatile data structures for data processing. For that reason, CedarDB depends on hash table to perform some of the most crucial parts of its query execution engine. Most prominently, CedarDB implements relational joins as hash joins. This blog post assumes you know what a hash join is. If not, the Wikipedia article has a short introduction into the topic for you.
To share our latest design, TUM and CedarDB published a peer-reviewed scientific paper, which Altan will present at DaMoN'24 in Santiago de Chile next week. To solve this issue, we can combine the pointer array of a chaining hash table, which gives us collision resistance and a low false positive rate with Bloom filters, and the dense storage layout of a linear probing scheme. Since a database system executes many different user-specified queries, a hash join implementation shouldn’t be optimized for one specific case, but improve the average runtime over all of our users’ workloads.
Or read this on Hacker News