Get the latest tech news

New LLM optimization technique slashes memory costs


Universal Transformer Memory uses neural networks to determine which tokens in the LLM's context window are useful or redundant.

The technique, named “ Universal Transformer Memory,” uses special neural networks to optimize LLMs to keep bits of information that matter and discard redundant details from their context. “This new capability allows transformers to discard unhelpful or redundant details, and focus on the most critical information, something we find to be crucial for tasks requiring long-context reasoning,” the researchers write. “Even in these out-of-distribution settings, NAMMs retain their benefits by discarding tokens such as redundant video frames and suboptimal actions, allowing their new base models to focus on the most relevant information to improve performance,” the researchers write.

Get the Android app

Or read this on Venture Beat