Pedestrian Data Generation Through Simulation, Diffusion, and Conditional Image Synthesis

Thumbnail Image
Farley, Andrew
Supervised Learning , Simulation , Pedestrian Detection , Sim2Real Transfer , Conditional Image Synthesis , Diffusion , Conditional Diffusion , Dataset Generation , CARLA , Cross-Dataset Evaluation
Simulated data has been proposed as a solution to the costly process of creating large datasets that deep learning models require through simulation platforms such as CARLA. However, creating a dataset synthetically introduces the sim2real gap where models trained on this data do not perform well when transferred to the real world. To fix this, sim2real transfer has been proposed in many works which transforms the simulated data to match its real-world counterparts. As such, we explore the use of simulation and sim2real transfer using the task of pedestrian detection as a reference frame evaluated using the M R−2 score. After implementing and iterating upon simulated datasets with sim2real transfer, we found this data lacking in terms of cross-dataset evaluation capability when tested against our real-world benchmark. To improve this result, we look to other methods of synthetic data creation. Specifically, diffusion models and pretrained pedestrian detection models are used to generate additional data which then augments the simulated dataset. Diffusion has the advantage of creating arbitrarily large datasets more quickly than simulation and represents the real-world target dataset more effectively. This thesis presents a pipeline for generating pedestrian detection data using a simulation platform, our sim2real transfer method, and our method of generated data through diffusion. We show that using these methods together results in the closest cross-dataset evaluation to the real-world benchmark. The best CARLA trained pedestrian detection model was able to achieve a 59.08% reasonable M R−2 score. When combined with diffusion generated data, the model was able to achieve 42.97% reasonable M R−2 score which is a 16.11% improvement over CARLA’s performance.
External DOI