Rigorous Systems Research Group (RSRG) Seminar
Reinforcement learning (RL) has shown inspiring successes in game domains. The current algorithms, however, are too sample intensive for many important applications. One major obstacle to sample-efficient RL is the lack of systematic exploration: popular exploration heuristics provably incur exponential sample complexity (in planning horizon) even in toy problems. While mature theory on exploration exists for small state spaces, the methods cannot handle practical scenarios with rich sensory inputs that require function approximation.
In this talk I will describe recent theoretical progress on this long-standing problem. I will introduce Bellman rank as a complexity measure for exploration, which is naturally small for many important RL settings, ranging from classical control problems to popular empirical benchmarks. A unified algorithm, OLIVE, can be applied to all these settings, in some cases obtaining polynomial sample complexity for the first time. Joint work with: Akshay Krishnamurthy, Alekh Agarwal, John Langford, Rob Schapire, and Christoph Dann.