skip to main content
Caltech

Information, Geometry, and Physics Seminar

Wednesday, October 30, 2024
4:00pm to 5:30pm
Add to Cal
Linde Hall 310
Linguistic Structure from a Bottleneck on Sequential Information Processing
Richard Futrell, Department of Language Science, UC Irvine,

Human language is a unique form of communication in the natural world. Most fundamentally, it has systematic structure, meaning that signals can be broken down into component parts that are individually meaningful -- roughly, words -- which are combined in a regular, hierarchical way to form sentences. Furthermore, the way in which these parts are combined maintains a kind of locality: words are usually concatenated together, and they form contiguous phrases. I argue that natural-language-like systematicity arises in codes that minimize predictive information, a measure of statistical complexity that represents the minimum amount of information necessary for predicting the future of a sequence based on its past (Bialek, Nenenman & Tishby, 2001). In simulations, I show that codes that minimize excess entropy factorize their source distributions into groups of approximately independent components which are expressed systematically and locally, corresponding to words and phrases. Next, drawing on large bodies of naturalistic text, I show that human languages are structured in a way that reduces predictive information at the level of phonology, morphology, syntax, and semantics. These results establish a link between the statistical and algebraic structure of human language.

For more information, please contact Mathematics Department by phone at 626-395-4335 or by email at [email protected].