Skeleton-Based Human Activity Recognition using Temporal Cycle-Consistency Learning
machine learning , activity recognition , video alignment , few-shot learning , video classification
Human skeleton data is an effective compression of video data for activity recognition. There has been only limited research into the application of few-shot techniques, where clips are compared to samples from a subset of classes, using skeleton activity data. Additionally, temporal alignment techniques have shown promise for creating semantically relevant embeddings for video data. In this thesis, we explore the use of temporal alignment, specifically temporal cycle-consistency learning, for constructing sample-similarity based classifiers and few-shot classifiers. These models aim to inherit the advantages of existing few-shot methods while taking advantage of the characteristics of skeleton data and the alignment constancy within the activity classes. Overall, we found that the alignment losses are not effective for strict classification when compared to more direct techniques. However, we demonstrate that the few-shot paradigm can leverage intra-class alignment for classification, achieving a 1-shot 5-way accuracy of 55.84% and a 5-shot 5-way accuracy of 73.83% on the NTU-RGB+D 120 activity recognition dataset.