Get the latest tech news
What's the strongest AI model you can train on a laptop in five minutes?
What’s the strongest model I can train on my MacBook Pro in five minutes? I’ll give the answer upfront: the best 5-minute model I could train was a ~1.8M-param…
I started with a dataset pulled from the Simple English Wikipedia, which seemed sensible: using straightforward grammar and vocabulary means less for the model to learn. Interestingly, this more or less coincides with the well-known Chinchilla scaling laws paper, which says your optimal model size is your total number of training tokens divided by 20. I tried to understand every line of code I wrote (and have previously written transformers from scratch), but I definitely would not have tried things like the diffusion models without LLM assistance.
Or read this on Hacker News