Get the latest tech news

Microsoft’s Differential Transformer cancels attention noise in LLMs


A simple change to the attention mechanism can make LLMs much more effective at finding relevant information in their context window.

Improving the capabilities of large language models (LLMs) in retrieving in-prompt information remains an area of active research that can impact important applications such as retrieval-augmented generation (RAG) and in-context learning(ICL). Wei and his colleagues also observed that some LLM hallucinations, where the model produces incorrect outputs despite having relevant context information, correlate with spurious attention patterns. Similar to ResNet, the residual connection is an addition, compared with the subtraction in Diff Transformer, so it wasn’t immediately apparent for researchers to propose the idea.”

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Microsoft

Microsoft

Photo of LLMs

LLMs

Photo of attention noise

attention noise

Related news:

News photo

Amazon, Google and Microsoft Are Investing in Nuclear Power

News photo

Microsoft removes the $1 Xbox Game Pass trial just before Call of Duty: Black Ops 6

News photo

Microsoft unveils customisable Call of Duty: Blacks Ops 6 controllers and matching Xbox Series X wrap