Deep Model-Based Reinforcement Learning for Sample Efficient Predictive Control

Loading...
Thumbnail Image

Authors

Antonyshyn, Luka

Date

Type

thesis

Language

eng

Keyword

Reinforcement Learning , Control , Robotics

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Deep reinforcement learning algorithms provide a framework for learning high performance solutions to complex problems without explicit knowledge of the systems with which the algorithms are interacting. These algorithms are of particular interest for the fields of control and robotics, which traditionally rely on exact knowledge of the system dynamics and of the desired solutions. However, most state-of-the-art deep reinforcement learning algorithms require a large amount of interaction with a system before learning an effective solution, and have problems with learning solutions when we express the effectiveness of a given solution using only sparse signals given upon failure or success in solving a problem. These problems limit the applicability of deep reinforcement learning methods to real-world systems, where data is difficult to collect and problems are naturally expressed in the form of binary successes or failures. Additionally, deep reinforcement learning algorithms are difficult to analyse due to the difficult to interpret nature of artificial neural networks. In this work, we propose and evaluate a deep model-based reinforcement learning algorithm for control which is able to learn effective solutions in far fewer interactions with the environment when compared to model-free state-of-the-art algorithms, both with sparse and dense rewards. The first proposed algorithm learns a policy competitive with that of model-free algorithms with about 47% as many episodes on a dense reward inverted pendulum on a cart task, and learns a policy which matches the maximal performance of the best performing model-free algorithm in less than 6% of the number of episodes on a sparse reward reach-avoid game. We verify the functioning of the algorithm on a real-robot experiment with no fine-tuning. We propose two variants of the algorithm that use quadratic neural networks to extract analysable predictive controllers, improving the interpretability of the method and allowing us to use nonlinear optimization methods to select actions. The version of the algorithm that uses dense neural networks as an intermediate during training reduces the number of episodes needed to learn a solution to the inverted pendulum on a cart to 8.1% of the first algorithm, or 3.8% of the model-free algorithms. We also provide analysis of a known phenomenon referred to as policy oscillation that affects approximate dynamic programming algorithms, as well as one of the proposed methods.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN