Online Learning in Control Theory

Loading...
Thumbnail Image

Authors

Akbari Varnousfaderani, Mohammad

Date

Type

thesis

Language

eng

Keyword

Online Learning , Optimal Control , Linear Quadratic Gaussian , Regret Minimization

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

In this thesis, we study two classes of problems in optimal control theory involving unknown parameters, with focus on Linear-Quadratic-Gaussian systems. In the first problem, the control system is known and linear, and a sequence of quadratic cost functions is revealed to the controller in hindsight. The controller aims to achieve a sublinear regret, similar to the online optimization setting. A modified online Riccati algorithm is introduced that under some uniform boundedness assumptions results in a logarithmic regret bound. In particular, the logarithmic regret for the scalar case is achieved without any extra condition, while in the vector case we impose a uniform boundedness assumption. In addition to having a better regret bound, the proposed algorithm has reduced complexity compared to the ones in the literature which mainly rely on solving semi-definite programs in each time step. In the second problem, the true system transition parameters (matrices A and B) are unknown, and the objective is to design and analyze algorithms that generate control policies with sublinear regret. Recent studies show that when the system parameters are fully unknown, for any algorithm that only uses data from the past system trajectory there exists a choice of the unknown parameters such that the algorithm at best achieves a square-root regret bound, providing a hard fundamental limit on the achievable regret in general. However, it is known that (poly)-logarithmic regret is achievable when only matrix A, or only matrix B is unknown. We prove a result, showing that (poly)-logarithmic regret is achievable when both of these matrices are unknown, but a hint about one of them is given to the learner periodically. This result is the first attempt to achieve poly-logarithmic regret for the case that both A and B are unknown under milder assumptions.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN