A MARL Approach for Finding Optimal Positions for VANET Aerial Base-stations on a Sparse Highway

Thumbnail Image
Jiang, Bote
Reinforcement Learning , Automated Systems , UAV , Connected Vehicles , Artificial Intelligence , VANET
A Vehicular Ad-Hoc Network (VANET) helps connected vehicles send and receive environmental and traffic information, making it a crucial component towards fully autonomous roads. For VANETs to serve their purpose, there has to be sufficient coverage, even in areas where there is less demand. Moreover, a lot of the safety information is time-sensitive; excessive outage time in a vehicular network can increase the risk of fatal accidents. Unmanned Aerial Vehicles (UAVs) can be used as mobile base-stations to fill in gaps of coverage. My work is focused on the placement of mobile base-stations for rural highways with sparse traffic, as it represents the worst-case scenario for vehicular communication. The goal is to maximize the segments of road that satisfy a particular communication outage time constraint. I use Multi-Agent Reinforcement Learning (MARL) to learn the optimal placement strategy. The main benefit of MARL is that it allows the agents to learn complex strategies through experience. I propose a variation of the traditional Deep Independent Q-Learning. The modifications include an observation function augmented with information directly shared between neighbouring agents as well a shared policy scheme. I also implement a lightweight custom sparse highway simulator that is used for training and testing my algorithm. The experiments show that the proposed MARL algorithm is able to learn the placement policies that produce the maximum rewards for different scenarios while adapting to the dynamic road densities along the highway segment. The experiments also show that the model is scalable, allowing the number of agents to increase without any modifications to the code. The model also displays robustness as it is still able to resume function even after multiple single and dual-point failures. Finally, I show that the model can be generalized as the algorithm can be directly used, with similar performance, on an industry standard simulator. Future experiments can be performed to improve the realism and complexity of the highway models as well as to test the method on real-world data.
External DOI