Get the latest tech news
Measuring the impact of AI on experienced open-source developer productivity
We conduct a randomized controlled trial to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.
Our RCTBenchmarks like SWE-Bench Verified, RE-BenchAnecdotes and widespread AI adoptionTask typePRs from large, high-quality open-source codebasesSWE-Bench Verified: open-source PRs with author-written tests, RE-Bench: manually crafted AI research problems with algorithmic scoring metricsDiverseTask success definitionHuman user is satisfied code will pass review - including style, testing, and documentation requirementsAlgorithmic scoring (e.g. automated test cases)Human user finds code useful (potentially as a throwaway prototype or ~single-use research code)AI typeChat, Cursor agent mode, autocompleteTypically fully autonomous agents, which may sample millions of tokens, use complicated agent scaffolds, etc.Various models and tools ObservationsModels slow down humans on 20min-4hr realistic coding tasksModels often succeed at benchmark tasks that are very difficult for humansMany people (although certainly not all) report finding AI very helpful for substantial software tasks taking them >1hr, across a wide range of applications. Our results also suggest that AI capabilities may be comparatively lower in settings with very high quality standards, or with many implicit requirements (e.g. relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn. There are meaningful tradeoffs between methods, and it will continue to be important to develop and use diverse evaluation methodologies to form a more comprehensive picture of the current state of AI, and where we’re heading.
Or read this on Hacker News