IST Lunch Bunch
Bandit algorithms tackle the fundamental challenge of balancing exploration (collecting data for learning better models) and exploitation (using the estimates to make decisions). In this talk, I will formalize bandit problems with preference feedback, with structured decision spaces, and with safety constraints (when bad samples are not allowed). These constraints commonly exist in many applications. In particular, we are motivated by online decision-making for clinical treatment and robotic control. This talk will exhibit several algorithms for these constrained optimization problems. Theoretical guarantees and empirical efficiencies of our algorithms will be presented. I will also show our clinical practices of online decision-making for neuromodulation.