Get the latest tech news
You could have designed state of the art positional encoding
fleetwood.dev
We will achieve this by iteratively improving our approach to encoding position, arriving at Ro tary P ostional E ncoding (RoPE) used in the latest LLama 3.2 release and most modern transformers. To reiterate, the self-attention mechanism enables the model to weigh the importance of different elements in an input sequence and dynamically adjust their influence on the output. By artfully applying our rotations to 2D chunks of q\mathbf{q}q and k\mathbf{k}k prior to their dot product, and switching from additive to multiplicative, we can gain a big performance boost in evaluations 4.
Or read this on Hacker News