Get the latest tech news

Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers


Evaluating agents as senior engineers on the work we actually give them

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of agents

agents

Photo of senior engineers

senior engineers

Photo of SWE-Bench

SWE-Bench

Related news:

News photo

Show HN: PMB – local memory for coding agents that shows if it is used

News photo

Morgan Stanley cut its riskiest reconciliation job in half — by making its agents less autonomous

News photo

Anthropic launches Claude Sonnet 5 as a cheaper way to run agents