Get the latest tech news
SOTA on swebench-verified: relearning the bitter lesson
Searching code is an important part of every developer's workflow. We're trying to make it better.
The only prior setup we had made at our end was to ensure the prerequisite set of python packages were available as part of the docker container which swebench-verified uses, thus eliminating the need for the agent to spend cycles just to fix environment problems. Instead of focusing on developing the best framework and being smart, we invested heavily in our infra setup (2 beefy VMs and 2 MacBook Pro M2 Max’s), all churning out tokens within the docker containers for solving swe-bench. The research paper "SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement"[1] illustrates the efficacy of integrating MCTS with a reward function and feedback mechanisms for navigating the solution space.
Or read this on Hacker News