Get the latest tech news
LLäMmlein 1B and 120M – German-only decoder models
& 120M We created two German-only decoder models, LLäMmlein 120M and 1B, from scratch. The project involved several key steps, including extensive data preprocessing, the creation of a custom tokenizer, and optimization of training settings to effectively utilize available hardware.
The project involved several key steps, including extensive data preprocessing, the creation of a custom tokenizer, and optimization of training settings to effectively utilize available hardware. Throughout the training process, various checkpoints were saved and analyzed to monitor the models' learning dynamics. The LLäMmlein 1B also showed comparable results to larger models, with no significant performance difference observed.
Or read this on Hacker News