Get the latest tech news
Meta Open-Sources Megalodon LLM for Efficient Long Sequence Modeling
Researchers from Meta, University of Southern California, Carnegie Mellon University, and University of California San Diego recently open-sourced MEGALODON, a large language model (LLM) with an unlimited context length. MEGALODON has linear computational complexity and outperforms a similarly-sized Llama 2 model on a range of benchmarks.
Another scheme that InfoQ recently covered is the RWKV Project's attention-free Transformer model, which has no maximum input context length. MEGALODON builds on the research team's previous model, MEGA(exponential moving average with gated attention), with several new features. MEGALODON outperformed all baseline models on the NarrativeQA subtask, and on all tasks achieved results "competitive" with Llama 2.
Or read this on Hacker News