Get the latest tech news

Byte Latent Transformer: Patches Scale Better Than Tokens


We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at...

Patches are segmented dynamically based on the entropy of the next byte, allocating more compute and model capacity where increased data complexity demands it. The LCM team, Loic Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussa, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk Hu Xu, Bernie Huang, Ellen Tan, Ching-Feng Yeh, Jacob Kahn, Christine Jou, Gargi Ghosh, Omer Levy, Luke Zettlemoyer, Scott Yih, Philippe Brunet, Kim Hazelwood, Ramya Raghavendra, Daniel Li (FAIR), Saining Xie, Christoph Feichtenhofer

Get the Android app

Or read this on Hacker News

Read more on:

Photo of patches

patches

Photo of tokens

tokens

Photo of byte

byte

Related news:

News photo

Patches Posted For Review Adding COBOL Frontend To GCC Compiler

News photo

Instant macOS install on Proxmox including AMD patches

News photo

Perf Support For 2,048 CPU Cores Is Becoming Not Enough - Patches Bump Kernel Limit