Get the latest tech news
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
The new model shows open-source closing in on closed-source models, suggesting reduced chances of one big AI player ruling the game.
Ultimately, DeepSeek, which started as an offshoot of Chinese quantitative hedge fund High-Flyer Capital Management, hopes these developments will pave the way for artificial general intelligence (AGI), where models will have the ability to understand or learn any intellectual task that a human being can. This approach ensures it maintains efficient training and inference — with specialized and shared “experts” (individual, smaller neural networks within the larger model) activating 37B parameters out of 671B for each token. Following this, we conducted post-training, including Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on the base model of DeepSeek-V3, to align it with human preferences and further unlock its potential.
Or read this on Venture Beat