Get the latest tech news
Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M
Very significant new release from Alibaba's Qwen team. Their openly licensed (sometimes Apache 2, sometimes Qwen license, I've had trouble keeping up) Qwen 2.5 LLM previously had an input token …
This new model increases that to 1 million, using a new technique called Dual Chunk Attention, first described in this paper from February 2024. You can also use the previous framework that supports Qwen2.5 for inference, but accuracy degradation may occur for sequences exceeding 262,144 tokens. I'll update this post when I figure out how to run longer prompts through the new Qwen model using GGUF weights on a Mac.
Or read this on Hacker News