Get the latest tech news

Top model scores may be skewed by Git history leaks in SWE-bench


We've identified multiple loopholes with SWE Bench Verified where agents may look at future repository state (by querying it directly or through a variety of methods), and cases in which future rep...

We've identified multiple loopholes with SWE Bench Verified where agents may look at future repository state (by querying it directly or through a variety of methods), and cases in which future repository state includes either solutions or detailed approaches to solving problems (commit messages and more). Mitigation will be to properly remove future repository state and any artifacts that contain information the agent could use (reflogs, branches, origins, tags, and more): remove origins (branch names can reveal information about fixes) remove all branches git log --all can be used to query them, plus branches that are tracking a remote origin might contain information about future commits even after a git reset --hard remove the reflog ( git reflog) can leak future commit messages that could detail approaches for solutions

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Bench

Bench

Photo of Git

Git

Photo of SWE

SWE

Related news:

News photo

Some thoughts on personal Git hosting

News photo

Let us git rid of it, angry GitHub users say of forced Copilot features

News photo

Let us git rid of it, angry GitHub users say of forced Copilot features