Get the latest tech news

Llama3 implemented from scratch

llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch

normally, reading this depends on how the model classes are written and the variable names inside them. the model has 32 transformer layers each multi-head attention block has 32 heads the vocab size and so on > when we load the query, key, value and output vectors from the model we notice the shapes to be [4096x4096], [1024x4096], [1024x4096], [4096x4096] > at first glance this is weird because ideally we want each q,k,v and o for each head individually > the authors of the code bundled them togeather because its easy it helps parallize attention head multiplication.

Get the Android app

Or read this on Hacker News