Get the latest tech news

Foundry (YC F24) is hiring an engineer to build an internet-scale web crawler


About Us We’re building the first end-to-end testing platform for web agents, including a Browser Gym for RL-driven optimization. Our platform helps teams evaluate, benchmark, and improve web agents before they go live, ensuring they can handle real-world, dynamic environments. With synthetic user simulations, automated evaluations, and large-scale benchmarking, we’re setting a new standard for web agent testing. We’re a YC-backed team, and this is a founding engineering role—you’ll be one of the first hires defining how we crawl, structure, and analyze the open web at scale. The Role We need a Founding Web Scraping Engineer to build internet-scale web crawling infrastructure—not just scraping a single site, but handling millions of domains and evolving anti-bot defenses. You’ll be responsible for designing robust, distributed crawling systems that adapt dynamically to web changes, optimize for efficiency, and ensure reliable data extraction. What You’ll Do Build large-scale, distributed crawlers that intelligently prioritize, schedule, and optimize requests across millions of domains. Develop adaptive web scraping systems that handle DOM changes, WebSockets, AJAX-heavy sites, and dynamically loaded content. Optimize scraping performance and resilience, ensuring high-throughput data extraction with proxy/network optimizations and behavior-driven stealth tactics. Solve captchas at scale, integrating third-party solvers, heuristic-based workarounds, and behavior-driven bypass techniques. Manage proxy and identity rotation, implementing session-aware scraping, JA3/TLS fingerprint spoofing, and request signature control. Structure and clean extracted data for downstream analytics, AI training, and benchmarking applications. What We’re Looking For Expert-level experience in large-scale web scraping & crawling (Selenium, Puppeteer, Playwright, Scrapy, undetected-chromedriver). Deep knowledge of anti-bot detection strategies (TLS fingerprinting, JA3 signatures, request header anomalies, and bot behavior tracking). Hands-on expertise with captcha-solving strategies, including leveraging APIs, OCR-based approaches, and behavior-driven evasion. Proven experience building efficient proxy management systems, including rotating IPs across residential, datacenter, and mobile networks. Proficiency in Python, Go, or JavaScript, with experience in high-performance, parallelized scraping frameworks. Understanding of HTTP/2, HTTP/3, WebSockets, GraphQL, and browser-based fingerprinting. Experience designing scalable, fault-tolerant scraping infrastructure that adapts to changes in real time. Bonus Points Experience with search engine-scale crawling. Background in LLM-driven web extraction or RL-enhanced adaptive crawling. Contributions to open-source scraping tools or web automation projects. Why Join? Founding role—you’ll define and own our web crawling infrastructure from day one. Work at internet scale—building a system that dynamically adapts and scales across millions of domains. YC-backed—we’re building something that doesn’t exist yet, and you’ll be part of the core team making it happen.

You’ll be responsible for designing robust, distributed crawling systems that adapt dynamically to web changes, optimize for efficiency, and ensure reliable data extraction. Deep knowledge of anti-bot detection strategies (TLS fingerprinting, JA3 signatures, request header anomalies, and bot behavior tracking). Proven experience building efficient proxy management systems, including rotating IPs across residential, datacenter, and mobile networks.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Foundry

Foundry

Photo of founding engineer

founding engineer

Photo of f24

f24

Related news:

News photo

Exa Laboratories (YC S24) Is Hiring a Founding Engineer to Build AI Chips

News photo

Hell freezes, pigs fly: Rumor has it that Intel could merge with AMD's former foundry in potential multi-billion deal

News photo

TSMC and Broadcom explore deals to split Intel's foundry and chip design wings, says report