H.B. Keller Colloquium
Recent literature has witnessed much progress on the algorithmic and theoretical foundations of Reinforcement Learning (RL), particularly for single-agent problems with small state/action spaces. Our understanding and algorithm toolbox for RL under complex environments, however, remain relatively limited. In this talk, I will discuss some of my work on scalable and probably efficient RL for the challenging settings with large spaces and multiple strategic agents.
First, I will focus on simulation-based methods, as exemplified by Monte-Carlo Tree Search (MCTS). MCTS is a powerful paradigm for online planning that enjoys remarkable empirical success, but lacks theoretical understanding. We provide a complete and rigorous non-asymptotic analysis of MCTS. Our analysis develops a general framework based on a hierarchy of bandits, and highlights the importance of using a non-standard confidence bound (also used by AlphaGo) for convergence. I will further discuss combining MCTS with supervised learning and its generalization to continuous action space.
In the second part of the talk, I will discuss on-policy RL for zero-sum Markov games, which generalizes Markov decision processes to multi-agent settings. We consider function approximation to deal with continuous and unbounded state spaces. Based on a fruitful marriage with algorithmic game theory, we develop the first computational efficient algorithm for this setting, with a provable regret bound that is independent of the cardinality and ambient dimension of the state space.