Get the latest tech news
SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch
Contribute to Om-Alve/smolGPT development by creating an account on GitHub.
Designed for educational purposes and simplicity, featuring efficient training, flash attention, and modern sampling techniques. Minimal Codebase: Pure PyTorch implementation with no abstraction overhead Modern Architecture: GPT model with: Flash Attention (when available) RMSNorm and SwiGLU Efficient top-k/p/min-p sampling Note: This implementation is inspired by modern LLM training practices and adapted for educational purposes.
Or read this on Hacker News