Get the latest tech news

Exploiting the most prominent AI agent benchmarks


Our agent hacked every major one. Here’s how — and what the field needs to fix.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of agent benchmarks

agent benchmarks