Estimating Human Pose from Pressure and Vision Data
Loading...
Authors
Davoodnia, Vandad
Date
2024-06-17
Type
thesis
Language
eng
Keyword
Human Pose Estimation, Pressure Maps, Computer Vision, Deep Learning, Transformers, Markerless Motion Capture, 3D Body Modeling, Multi-view Pose Estimation
Alternative Title
Abstract
Estimating human pose has numerous applications, ranging from healthcare and sports analysis to virtual reality and human-computer interaction. In this dissertation, we study two input domains for pose estimation, vlision and pressure. While vision-based pose estimation is often more robust than pressure-based estimation due to its higher resolution and less noisy signals, it suffers from privacy concerns. Hence, each modality is often considered for different applications; vision-based systems are used for animation, entertainment, and sports, among others, while pressure-based systems are often utilized in smart clinics, homes, and vehicles. In this dissertation, we study and propose solutions to address unique challenges in each domain.
First, we address the domain gap between image-based pose estimators and pressure data by proposing a learnable pre-processing module called PolishNetU. Our experiments on two publicly available datasets show that combining PolishNetU and re-training pre-existing image-based pose estimators overcomes the issue of highly vague pressure points. Next, we tackle the challenge of body part occlusions from pressure maps when a limb is not in direct contact with the pressure sensors. To this end, we propose T-ViTPose, a temporal pose estimator based on vision transformers, to capture subtle movements on pressure sensors. Furthermore, we show that self-supervised pre-training using a masked auto-encoder approach improves results.
In the latter part of this dissertation, we focus on enhancing the robustness, generalization, and scalability of multi-view systems. We introduce UPose3D, a 3D keypoint estimation pipeline scalable to any number of cameras. We propose a training routine based on synthetic data generation to ensure generalization across different poses and viewpoints. By leveraging uncertainty and our novel cross-view fusion strategy, we improve our model's outlier and noise robustness and achieve state-of-the-art performance in out-of-distribution experiments. Next, we present SkelFormer, an inverse-kinematic model that obtains rotational pose and shape parameters of a body model given 3D keypoints. By training this module to reconstruct correct poses from corrupted 3D keypoints, we improve our pipeline's out-of-distribution generalization, as well as robustness to noise and occlusions. Our research significantly enhances the performance of existing systems and paves the way for future advancements in the field.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-NonCommercial-ShareAlike 4.0 International
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
Attribution-NonCommercial-ShareAlike 4.0 International