Get the latest tech news
Sampling with SQL
s one of the most powerful tools you can wield to extract meaning from large datasets. It lets you reduce a massive pile of data into a small yet representative dataset that’s fast and easy to use.
And with the SQL logic we’ve just discussed, you can take fast, easy samples from virtually any dataset, no matter how large. Then we can augment our SQL sampling logic with a pushdown filter that eliminates population rows with arrival times greater than \(c \cdot t\) for some constant \(c\). This filtering happens before ORDER/LIMIT processing and can greatly speed queries by eliminating more than 99.99% of rows early on, before they are even fully read on systems that support “late materialization.”
Or read this on Hacker News