Get the latest tech news

Following the Text Gradient at Scale (2025)

RL Throws Away Almost Everything Evaluators Have to Say When you get feedback on your work, it usually tells you what went wrong and how to fix it. But existing reinforcement learning (RL) algorithms throw most of that information away; it compresses potentially rich feedback into a single number, a reward1, then tries to learn by correlating rewards with actions across hundreds or thousands of attempts.

None

Get the Android app

Or read this on Hacker News