Get the latest tech news

SWE Bench just got updated – new #1s


SWE-bench: Evaluate Language Models on Open Source Software Tasks

SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate [ Post]. - ✅ Checked indicates that we, the SWE-bench team, received access to the system and were able to reproduce the patch generations. - If you would like to submit your model to the leaderboard, please check the submission page.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of SWE Bench

SWE Bench