Swin PoseFormer, Efficient Skeleton Based Human Activity Recognition
Loading...
Authors
Qi, Haoran
Date
Type
thesis
Language
eng
Keyword
Skeleton Based Human Activity Recognition , Human Pose Estimation , Video Understanding
Alternative Title
Abstract
Human Activity Recognition (HAR) has undergone significant advancements lately. HAR involves identifying a person's movements by analyzing video or sensor data. Deep learning techniques, including convolutional neural networks, recurrent neural networks, and graph convolutional neural networks, have demonstrated the ability to automatically learn features and achieve state-of-the-art results. Dynamic skeletal data, represented as the 2D/3D coordinates of human joints, has been widely studied for human action recognition due to its high-level semantic information and environmental robustness. Many skeleton-based action recognition methods adopt graph convolutional networks (GCN) to extract features from human skeleton data. Despite the positive results achieved by the existing work in the literature, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. Meanwhile, Transformer models have shown great success in modeling long-range interactions. Existing research has shown that self-attention mechanism is promising for solving video processing tasks. In this thesis, we propose a novel network ,“Swin PoseFormer” which relies on Swin Transformer backbone to process a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Moreover, we propose a novel human pose generation pipeline "Skeletrack" which takes RGB video as input and produces 3-dimensional skeleton data. Our Skeletrack pipeline is able to handle fast moving objects with blurring issues by filling up the missing detections with estimated bounding boxes based on the target's spatial-temporal information. We test our Swin Poseformer model on FineGym dataset and demonstrate that our model outperforms the state of the art TSM with top-1 and top-5 accuracy of 84.32% and 98.45% respectively.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
