Get the latest tech news

Agents built from alloys


A simple, powerful innovation boosts performance in agentic AI systems.

In fact, we built a benchmark set of such tasks, and packaged them in a CTF-like style so we could easily repeat, scale, and assess our “solver agent’s” performance on it. The original set has, sadly, mostly outlived its usefulness because our solver agent is just too good on it by now, but we harvested more challenging examples from open source projects we ran on. Typically and for the experiments in this post, that number is 80: while we still get solves after more iterations, it becomes more efficient to start a new solver agent unburdened by the misunderstandings and false assumptions accumulated over time.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Performance

Performance

Photo of llm alloying

llm alloying

Photo of single model

single model

Related news:

News photo

I changed 12 settings on my Apple TV to instantly improve the performance

News photo

Servo Web Engine Further Tuning Performance, Screen Reader & Other New Features

News photo

New FFmpeg AVX-512 Optimizations Hit Up To 36x The Performance Of Plain C Code