Get the latest tech news

Show HN: Building a web search engine from scratch with 3B neural embeddings


End-to-end deep dive of the project, spanning a large GPU cluster, distributed RocksDB, and terabytes of sharded HNSW.

I started off by creating a minimal playground to experiment if neural embeddings were superior for search: take some web page, chunk it up, and see if I can answer complex indirect natural language queries with precision. But they often had issues at scale (delays and transient errors joining, propagating, and discovering changes), and traffic limitations and overhead — at the time, they could not easily saturate 10 Gbps connections and consumed a lot of CPU usage. The Internet is such a large search space, that figuring out direction and filtering is basically a necessity, to avoid picking up entire swathes of junk, spiralling in ever more deserted corners of the web, or going too deep in one area and leaving gaps in others.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Scratch

Scratch

Photo of web search engine

web search engine

Photo of 3B neural embeddings

3B neural embeddings

Related news:

News photo

YouTuber recreates a floppy disk from scratch

News photo

Writing a competitive BZip2 encoder in Ada from scratch in a few days (2024)

News photo

UK Scientists Plan to Construct Synthetic Human Genetic Material From Scratch