CMX Lunch Seminar
In this talk, we address the problem of continuous-time reinforcement learning in scenarios where the dynamics follow a stochastic differential equation. When the underlying dynamics remain unknown and we have access only to discrete-time information, how can we effectively conduct policy evaluation? We begin by highlighting that the commonly used Bellman equation is not always a reliable approximation to the true value function. We then introduce PhiBE, a PDE-based Bellman equation that offers a more accurate approximation to the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations. Additionally, we present a numerical algorithm based on the Galerkin method, tailored for solving PhiBE when only discrete-time trajectory data is available. Numerical experiments are provided to validate the theoretical guarantees we propose.