H.B. Keller Colloquium
Diffusion models are at the core of many state-of-the-art generative AI systems for content such as images, videos, and audio. These models crucially rely on estimating gradients of the data distribution (scores) and efforts to generalize score-based modeling to discrete structures have had limited success. As a result, state-of-the-art generative models for discrete data such as language are based on autoregressive modeling (i.e. next token prediction). In this work, we bridge this gap by proposing a framework that extends score matching to discrete spaces and integrates seamlessly to build discrete diffusion models. The resulting Score Entropy Discrete Diffusion models are an alternative probabilistic modeling technique that achieves highly competitive performance at the scale of GPT-2 while introducing distinct algorithmic benefits. Our empirical results challenge the longstanding dominance of autoregressive modeling and could pave the way for an alternative class of language models built from radically different principles