Get the latest tech news
Instella: New Open 3B Language Models
AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs). , In this blog we explain how the Instella models were trained, and how to access them.
In the first pre-training stage, we trained the model from scratch on 4.065 trillion tokens sourced from OLMoE-mix-0924, which is a diverse mix of two high-quality datasets DCLM-baseline and Dolma 1.7 covering domains like coding, academics, mathematics, and general world knowledge from web crawl. The multi-stage pre-training with diverse and high-quality data mix significantly enhanced Instella-3B’s capabilities, establishing it as a competitive and open alternative in the landscape of comparable size language models. The models are being released for research purposes only and are not intended for use cases that require high levels of factuality, safety critical situations, health, or medical applications, generating false information, facilitating toxic conversations.
Or read this on Hacker News