Get the latest tech news
AI Agents That Matter
Rethinking AI agent benchmarking and evaluation
Some of the most exciting applications of large language models involve taking real-world action, such as booking flight tickets or finding and fixing software bugs. The North Star of this field is to build assistants like Siri or Alexa and get them to actually work — handle complex tasks, accurately interpret users’ requests, and perform reliably. Devin, an “AI software engineer”, was announced with great hype 4 months ago, but has been panned in a video review and remains in waitlist-only mode.
Or read this on Hacker News