Get the latest tech news
Falcon Mamba 7B’s powerful new AI architecture offers alternative to transformer models
In several benchmark, Falcon Mamba 7B convincingly outperformed Llama 3 8B, Llama 3.1 8B, Gemma 7B and Mistral 7B.
According to TII, its all-new Falcon model uses the Mamba SSM architecture originally proposed by researchers at Carnegie Mellon and Princeton Universities in a paper dated December 2023. This way, the model can focus on or ignore particular inputs, similar to how attention works in transformers, while delivering the ability to process long sequences of text – such as an entire book – without requiring additional memory or computing resources. In a separate throughput test, it outperformed Mistral 7B’s efficient sliding window attention architecture to generate all tokens at a constant speed and without any increase in CUDA peak memory.
Or read this on Venture Beat