Get the latest tech news
Block Diffusion: Interpolating Autoregressive and Diffusion Language Models
SOCIAL MEDIA DESCRIPTION TAG TAG
We obtain a principled learning objective \( \mathcal{L}_\text{BD}(\mathbf{x}, \theta) \) by optimizing the following likelihood bound: We model the per-block likelihood under a simple discrete diffusion parameterization ( Sahoo et. Efficient Training & Sampling Algorithms Naively, we would compute the logits by applying \( \mathbf{x}_\theta^b( \mathbf{x}_t^b, \mathbf{K}^{1:b\text{-}1}, \mathbf{V}^{1:b\text{-}1}) \) in a loop \( B\) times.
Or read this on Hacker News