Online Learning in Control Theory
Akbari Varnousfaderani, Mohammad
Online Learning , Optimal Control , Linear Quadratic Gaussian , Regret Minimization
In this thesis, we study two classes of problems in optimal control theory involving unknown parameters, with focus on Linear-Quadratic-Gaussian systems. In the first problem, the control system is known and linear, and a sequence of quadratic cost functions is revealed to the controller in hindsight. The controller aims to achieve a sublinear regret, similar to the online optimization setting. A modified online Riccati algorithm is introduced that under some uniform boundedness assumptions results in a logarithmic regret bound. In particular, the logarithmic regret for the scalar case is achieved without any extra condition, while in the vector case we impose a uniform boundedness assumption. In addition to having a better regret bound, the proposed algorithm has reduced complexity compared to the ones in the literature which mainly rely on solving semi-definite programs in each time step. In the second problem, the true system transition parameters (matrices A and B) are unknown, and the objective is to design and analyze algorithms that generate control policies with sublinear regret. Recent studies show that when the system parameters are fully unknown, for any algorithm that only uses data from the past system trajectory there exists a choice of the unknown parameters such that the algorithm at best achieves a square-root regret bound, providing a hard fundamental limit on the achievable regret in general. However, it is known that (poly)-logarithmic regret is achievable when only matrix A, or only matrix B is unknown. We prove a result, showing that (poly)-logarithmic regret is achievable when both of these matrices are unknown, but a hint about one of them is given to the learner periodically. This result is the first attempt to achieve poly-logarithmic regret for the case that both A and B are unknown under milder assumptions.