Electrical Engineering Systems Seminar
Shannon famously declared that, for the purpose of communication, the semantic aspect of information is irrelevant. Not so in learning: What you use it for defines what information is. Unfortunately, most attempts to build a theory of information for deep learning have not led to concepts that are well defined or computable beyond toy problems. The very definition of information in a trained model, which is a deterministic function, is still a subject of controversy. Most bounds are vacuous, and few if any information quantities are computable for models with millions, let alone billions, of parameters. In this talk, I will describe a notion of information in a learned representation that is well-defined, can be computed for large-scale real-world models, and yields non-vacuous generalization bounds. I will then show how such a notion of information can be used to compute the complexity of a learning task and define a topology in the space of tasks, so we can compute how "far" two tasks are, and whether it is possible to "reach" one from another (transfer learning). Once we know how to compute it, measuring information during the training process sheds light on phenomena which have been observed in both biological and artificial, systems, such as irreversibility (critical learning periods), and forgetting, pointing to fundamental information processes that are independent of the medium, whether biological or artificial.