Get the latest tech news

QwQ-32B: Embracing the Power of Reinforcement Learning


QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.

For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI). Additionally, we are actively exploring the integration of agents with RL to enable long-horizon reasoning, aiming to unlock greater intelligence with inference time scaling.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of power

power

Photo of qwq-32b

qwq-32b

Related news:

News photo

China May Be Ready to Use Nuclear Fusion for Power by 2050

News photo

China May Be Ready to Use Nuclear Fusion for Power by 2050

News photo

Split Fiction review roundup: 'a testament to the power of human imagination'