Get the latest tech news
Q-learning is not yet scalable
Seohong Park UC Berkeley June 2025 Does RL scale? Over the past few years, we've seen that next-token prediction scales, denoising diffusion scales, contrastive learning scales, and so on, all the way to the point where we can train models with billions of parameters with a scalable objective that can eat up as much data as we can throw at it. Then, what about reinforcement learning (RL)? Does RL also scale like all the other objectives? Apparently, it does.
If the answer is yes, this would lead to at least an equivalent level of impact as the successes of AlphaGo and LLMs, enabling RL to solve far more diverse and complex real-world tasks very efficiently, in robotics, computer-using agents, and so on. As the problem becomes more complex and the horizon gets longer, the biases in bootstrapped targets accumulate more and more severely, to the point where we cannot easily mitigate them with more data and larger models. This is because GAE or similar on-policy value estimation techniques can deal with longer horizons relatively more easily (though at the expense of higher variance), without strict 1-step recursions.
Or read this on Hacker News