Get the latest tech news
Native Sparse Attention
Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Yuxing Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025.
We present NSA, a Natively trained Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. Anthology ID: 2025.acl-long.1126 Volume: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Month: July Year: 2025 Address: Vienna, Austria Editors: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar Venue: ACL SIG:Publisher: Association for Computational Linguistics Note:Pages: 23078–23097 Language:URL: https://aclanthology.org/2025.acl-long.1126/ DOI:Award: Best Paper Bibkey: yuan-etal-2025-native Cite (ACL): Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Yuxing Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, and Wangding Zeng. ACL Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Yuxing Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, and Wangding Zeng.
Or read this on Hacker News