Get the latest tech news

Crawling a billion web pages in just over 24 hours, in 2025


;dr: - 1.005 billion web pages - 25.5 hours - $462 For some reason, nobody's written about what it takes to crawl a big chunk of the web in a while: the last point of reference I saw was Michael Nielsen's post from 2012 Obviously lots of things have changed since then. Most bigger, better, faster: CPUs have gotten a lot more cores, spinning disks have been replaced by NVMe solid state drives with near-RAM I/O bandwidth, network pipe widths have exploded, EC2 has gone from a tasting menu of instance types to a whole rolodex's worth, yada yada.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of hours

hours

Photo of web pages

web pages

Related news:

News photo

A 28-year-old American with 7 hours of daily screen time has had roughly $124,000 spent on targeting them with ads

News photo

Tech firms must remove ‘revenge porn’ in 48 hours or risk being blocked, says Starmer

News photo

LLM-Generated Passwords Look Strong but Crack in Hours, Researchers Find