Get the latest tech news
Tencent Hunyuan-Large
Contribute to Tencent/Tencent-Hunyuan-Large development by creating an account on GitHub.
By open-sourcing the Hunyuan-Large model and revealing related technical details, we hope to inspire more researchers with innovative ideas and collectively advance the progress and application of AI technology. For instance, the introduction of a new CLA structure significantly reduces GPU memory usage, achieving a 50% savings in the KV-Cache portion, which ensures efficient handling of long text scenarios. ModelLLama3.1-405BLLama3.1-70BMixtral-8x22BDeepSeek-V2Hunyuan-LargeMMLU85.279.377.878.5 88.4 MMLU-Pro 61.6 53.849.5-60.2BBH85.981.678.978.9 86.3 HellaSwag-- 88.7 87.886.8CommonsenseQA85.884.182.4- 92.9 WinoGrande86.785.385.084.9 88.7 PIQA--83.683.7 88.3 NaturalQuestions--39.638.7 52.8 DROP84.879.680.480.1 88.9 ARC-C 96.1 92.991.292.495.0TriviaQA--82.179.9 89.2 CMMLU--60.084.0 90.2 C-Eval--59.681.7 91.9 C3--71.477.4 82.3 GSM8K89.083.783.779.2 92.8 MATH53.841.442.543.6 69.8 CMATH--72.378.7 91.3 HumanEval61.058.553.148.8 71.4 MBPP 73.4 68.664.266.672.6 Hunyuan-Large-Instruct achieves consistent improvements on most types of tasks compared to LLMs having similar activated parameters, indicating the effectiveness of our post-training.
Or read this on Hacker News