Get the latest tech news

Following the Text Gradient at Scale (2025)


RL Throws Away Almost Everything Evaluators Have to Say When you get feedback on your work, it usually tells you what went wrong and how to fix it. But existing reinforcement learning (RL) algorithms throw most of that information away; it compresses potentially rich feedback into a single number, a reward1, then tries to learn by correlating rewards with actions across hundreds or thousands of attempts.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of scale

scale

Photo of text gradient

text gradient

Related news:

News photo

How OpenAI delivers low-latency voice AI at scale

News photo

Microsoft levels up Azure Local to make it fit for large-scale sovereign clouds

News photo

Reddit may need to sell its soul for scale • Losing on aggregate, or backpage of the internet?