A Formal Environment for Task Assignment in a Multiple Traveling Repairmen Problem

Thumbnail Image
Bubis, Victor
Task Assignment , Reinforcement Learning , Multiple Traveling Repairmen Problem , Semi-Markov Decision Processes , Planning Under Uncertainty , Multi-Agent Planning
Multi-Robot Task Allocation is a fundamental problem in any application where robots co-operate to achieve a shared goal. Previous work in applying model-free reinforcement learning to this problem has been for homogeneous fleets or solely for a centralized single-agent assigner. We introduce a formal environment for single-task, single-robot, time-extended task allocation. Tasks are allocated online in a heterogeneous fleet over an arbitrarily large horizon which approximates non-Markovian uncertainties to a semi-Markov decision process. This environment is designed to decouple motion dynamics from task planning, and distinguishes between actions which instantaneously modify the existing allocation and actions which continue to the next epoch of the process. We prove that total execution times within this environment may be optimized by optimizing the value and quality functions. Then, we provide necessary considerations for applying model-free reinforcement learning to this environment, and perform a comparative study across algorithms, modes of centralization/decentralization, and modes of evaluation. We find that trust regions are necessary to prevent policy destruction late in training, and that multiple agents tend to explore more efficiently than a single agent. However, we also find that on-policy exploration results in policies which are dependent on random actions, and that nonstationarity is a pervasive shortcoming of the multi-agent approach.
External DOI