Optimal Zero-Delay Transmission of Markov Sources: Reinforcement Learning and Approximations
Loading...
Authors
Cregg, Liam Barry
Date
2024-07-15
Type
thesis
Language
eng
Keyword
Zero-delay coding , Reinforcement learning , Stochastic control , Markov decision processes , Source-channel coding
Alternative Title
Abstract
We study the problem of zero-delay coding for the transmission a Markov source over a noisy channel with feedback and present rigorous finite model approximations and reinforcement learning solutions which are guaranteed to achieve near-optimality. To this end, we formulate the problem as a Markov decision process (MDP) where the state is a probability-measure valued predictor/belief and the actions are quantizer maps. This MDP formulation has been used to show the optimality of certain classes of encoder policies in prior work. Despite such an analytical approach in determining optimal policies, their computation is prohibitively complex due to the uncountable nature of the constructed state space and the lack of minorization or strong ergodicity results which are commonly assumed for average cost optimal stochastic control. These challenges invite rigorous reinforcement learning methods, which entail several open questions addressed in our paper. We present two complementary approaches for this problem. In the first approach, we approximate the set of all beliefs by a finite set and use nearest-neighbor quantization to obtain a finite state MDP, whose optimal policies become near-optimal for the original MDP as the quantization becomes arbitrarily fine. In the second approach, a sliding finite window of channel outputs and quantizers together with a prior belief state serve as the state of the MDP. We then approximate this state by marginalizing over all possible beliefs, so that our policies only use the sliding finite window term to encode the source. Under an appropriate notion of predictor stability, we show that such policies are near-optimal for the zero-delay coding problem as the window length increases. We give sufficient conditions for predictor stability to hold. For each scheme, we propose a reinforcement learning algorithm to compute near-optimal policies. We provide a detailed comparison of the two coding policies in terms of their approximation bounds and reinforcement learning implementation, in terms of their performance, as well as conditions for reinforcement learning convergence to near-optimality. We include key differences between the noisy and noiseless channel cases, as well as supporting simulation results.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-ShareAlike 4.0 International
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-ShareAlike 4.0 International