Get the latest tech news

Q-learning is not yet scalable


Seohong Park UC Berkeley June 2025 Does RL scale? Over the past few years, we've seen that next-token prediction scales, denoising diffusion scales, contrastive learning scales, and so on, all the way to the point where we can train models with billions of parameters with a scalable objective that can eat up as much data as we can throw at it. Then, what about reinforcement learning (RL)? Does RL also scale like all the other objectives? Apparently, it does.

If the answer is yes, this would lead to at least an equivalent level of impact as the successes of AlphaGo and LLMs, enabling RL to solve far more diverse and complex real-world tasks very efficiently, in robotics, computer-using agents, and so on. As the problem becomes more complex and the horizon gets longer, the biases in bootstrapped targets accumulate more and more severely, to the point where we cannot easily mitigate them with more data and larger models. This is because GAE or similar on-policy value estimation techniques can deal with longer horizons relatively more easily (though at the expense of higher variance), without strict 1-step recursions.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of learning

learning

Related news:

News photo

KumoRFM: A Foundation Model for In-Context Learning on Relational Data

News photo

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

News photo

Learning happens in environments optimized for understanding, not winning