Get the latest tech news
MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024. - facebookresearch/MobileLLM
In this work, we comprehensively consider multiple design factors to obtain high-quality LLMs with fewer than a billion parameters. We integrated (1) SwiGLU activation function, (2) deep and thin architectures, (3) embedding sharing, (4) grouped-query attention to build MobileLLM. This script can be modified to adjust the--nnodes parameter and other settings to suit different multi-node configurations, such as those using slurm or torchx.
Or read this on Hacker News