Get the latest tech news

Tencent Hunyuan-Large


Contribute to Tencent/Tencent-Hunyuan-Large development by creating an account on GitHub.

By open-sourcing the Hunyuan-Large model and revealing related technical details, we hope to inspire more researchers with innovative ideas and collectively advance the progress and application of AI technology. For instance, the introduction of a new CLA structure significantly reduces GPU memory usage, achieving a 50% savings in the KV-Cache portion, which ensures efficient handling of long text scenarios. ModelLLama3.1-405BLLama3.1-70BMixtral-8x22BDeepSeek-V2Hunyuan-LargeMMLU85.279.377.878.5 88.4 MMLU-Pro 61.6 53.849.5-60.2BBH85.981.678.978.9 86.3 HellaSwag-- 88.7 87.886.8CommonsenseQA85.884.182.4- 92.9 WinoGrande86.785.385.084.9 88.7 PIQA--83.683.7 88.3 NaturalQuestions--39.638.7 52.8 DROP84.879.680.480.1 88.9 ARC-C 96.1 92.991.292.495.0TriviaQA--82.179.9 89.2 CMMLU--60.084.0 90.2 C-Eval--59.681.7 91.9 C3--71.477.4 82.3 GSM8K89.083.783.779.2 92.8 MATH53.841.442.543.6 69.8 CMATH--72.378.7 91.3 HumanEval61.058.553.148.8 71.4 MBPP 73.4 68.664.266.672.6 Hunyuan-Large-Instruct achieves consistent improvements on most types of tasks compared to LLMs having similar activated parameters, indicating the effectiveness of our post-training.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Source

Source

Photo of Tencent

Tencent

Photo of commercial use

commercial use

Related news:

News photo

Anduril is considering Arizona, Ohio, or Texas for its massive manufacturing facility, source says

News photo

Open-source wheeled biped robot

News photo

Show HN: Someday, Open-Source Calendly Alternative for Gmail / Google App Script