Get the latest tech news

Qwen2.5-Max: Exploring the intelligence of large-scale MoE model


QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models, whether they are dense or Mixture-of-Expert (MoE) models. Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3. Concurrently, we are developing Qwen2.

Concurrently, we are developing Qwen2.5-Max, a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. We are dedicated to enhancing the thinking and reasoning capabilities of large language models through the innovative application of scaled reinforcement learning. This endeavor holds the promise of enabling our models to transcend human intelligence, unlocking the potential to explore uncharted territories of knowledge and understanding.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Max

Max

Photo of intelligence

intelligence

Photo of scale moe model

scale moe model

Related news:

News photo

TerraMaster F4-424 Max review: This 10GbE 4-bay Plex NAS destroys the DiskStation DS923+

News photo

Apple Brings in New Exec to 'Fix' Siri and Apple Intelligence

News photo

Amazon Prime members can get ad-free Max and Starz for $21 per month