Get the latest tech news
Llama3 implemented from scratch
llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch
normally, reading this depends on how the model classes are written and the variable names inside them. the model has 32 transformer layers each multi-head attention block has 32 heads the vocab size and so on > when we load the query, key, value and output vectors from the model we notice the shapes to be [4096x4096], [1024x4096], [1024x4096], [4096x4096] > at first glance this is weird because ideally we want each q,k,v and o for each head individually > the authors of the code bundled them togeather because its easy it helps parallize attention head multiplication.
Or read this on Hacker News