Get the latest tech news
SWE Bench just got updated – new #1s
SWE-bench: Evaluate Language Models on Open Source Software Tasks
SWE-bench Verified is a human annotator filtered subset that has been deemed to have a ceiling of 100% resolution rate [ Post]. - ✅ Checked indicates that we, the SWE-bench team, received access to the system and were able to reproduce the patch generations. - If you would like to submit your model to the leaderboard, please check the submission page.
Or read this on Hacker News