Get the latest tech news

You could have designed state of the art positional encoding

fleetwood.dev

We will achieve this by iteratively improving our approach to encoding position, arriving at Ro tary P ostional E ncoding (RoPE) used in the latest LLama 3.2 release and most modern transformers. To reiterate, the self-attention mechanism enables the model to weigh the importance of different elements in an input sequence and dynamically adjust their influence on the output. By artfully applying our rotations to 2D chunks of q\mathbf{q}q and k\mathbf{k}k prior to their dot product, and switching from additive to multiplicative, we can gain a big performance boost in evaluations 4.

Get the Android app

Or read this on Hacker News