Get the latest tech news
AI leaderboards are no longer useful. It's time to switch to Pareto curves
What spending $2,000 can tell us about evaluating AI agents
The accuracy of AlphaCode on coding tasks continues to improve even after making a million calls to the underlying model (the different curves represent varying parameter counts). We thank Rishi Bommasani, Rumman Chowdhury, Percy Liang, Shayne Longpre, Yifan Mai, Nitya Nadgir, Matt Salganik, Hailey Schoelkopf, Zachary Siegel, and Venia Veselovsky for discussions and inputs that informed our analysis. In particular, we are grateful to Zilong Wang (LDB), Andy Zhou (LATS), and Karthik Narasimhan (Reflexion), who gave us feedback in response to an earlier draft of this blog post.
Or read this on Hacker News