Get the latest tech news
Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute
Titans architecture complements attention layers with neural memory modules that select bits of information worth saving in the long term.
However, the Google researchers argue that linear models do not show competitive performance compared to classic transformers, as they compress their contextual data and tend to miss important details. To fill the gap in current language models, the researchers propose a “neural long-term memory” module that can learn new information at inference time without the inefficiencies of the full attention mechanism. With LLMs supporting longer context windows, there is growing potential for creating applications where you squeeze new knowledge into your prompt instead of using techniques such as RAG.
Or read this on Venture Beat