Get the latest tech news
Top model scores may be skewed by Git history leaks in SWE-bench
We've identified multiple loopholes with SWE Bench Verified where agents may look at future repository state (by querying it directly or through a variety of methods), and cases in which future rep...
We've identified multiple loopholes with SWE Bench Verified where agents may look at future repository state (by querying it directly or through a variety of methods), and cases in which future repository state includes either solutions or detailed approaches to solving problems (commit messages and more). Mitigation will be to properly remove future repository state and any artifacts that contain information the agent could use (reflogs, branches, origins, tags, and more): remove origins (branch names can reveal information about fixes) remove all branches git log --all can be used to query them, plus branches that are tracking a remote origin might contain information about future commits even after a git reset --hard remove the reflog ( git reflog) can leak future commit messages that could detail approaches for solutions
Or read this on Hacker News