Get the latest tech news
Llama 3 implemented in pure NumPy
May 16, 2024 - Overview - Structure - Generation - Example - GitHub - References 한글로 작성된 상세한 설명은 NumPy로 구현하는 라마 3 모델을 참고하세요. [Korean Version] Overview Llama 3 model unveiled at Meta is creating a buzz.
MQA, which is a Multi-query, has the advantage of being compact and memory-saving compared to MHA, which is a Multi-head, but it suffers from poor performance and unstable learning. We have implemented it by simply copying it by a multiple, and it can be improved by referencing the previous value for future optimization. In Llama model, Feed Forward uses 3 linear with matmul only and no bias, so unlike GPT, it is not a complete fully-connected layer.
Or read this on Hacker News