DOLCIT Seminar
Annenberg 213
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for TD Learning
R. Srikant,
Fredric G. and Elizabeth H. Nearing Endowed Professor of Electrical and Computer Engineering and the Coordinated Science Lab,
University of Illinois at Urbana-Champaign,
Temporal difference learning is a widely-used algorithm to estimate the value function of an MDP under a given policy. Here, we consider TD learning with linear function approximation and a constant learning rate, and obtain bounds on its finite-time performance. Motivated by these bounds, we will present a heuristic to adapt the learning rate to achieve fast convergence. Joint work with Lei Ying and Harsh Gupta.
For more information, please contact Kamyar Azizzadenesheli by email at [email protected].
Event Series
RSRG/DOLCIT Seminar Series