Model-Based Segmentation and Recognition of Continuous Gestures
MetadataShow full item record
Being one of the most active research topics in the computer vision field, automatic human gesture recognition is receiving increasing attention driven by its promising applications, ranging from surveillance and human monitoring, human-computer interface (HCI), and motion analysis, etc. Segmentation and recognition of human dynamic gestures from continuous video streams is considered to be a highly challenging task due to the spatio-temporal variation and endpoint localization issues. In this thesis, we propose a Motion Signature, which is a 3D spatio-temporal surface based upon the evolution of a contour over time, to reliably represent dynamic motion. A Gesture Model, is then constructed by a set of mean and variance images of Motion Signatures in a multi-scale manner, which not only is able to accommodate a wide range of spatio-temporal variation, but also has the advantage of requiring only a small amount of training data. Three approaches have been proposed to simultaneously segment and recognize gestures from continuous streams, which mainly differ in the way that the endpoints of gestures are located. While the first approach adopts an explicit multi-scale search strategy to find the endpoints of the gestures, the other two employ Dynamic Programming (DP) to handle this issue. All the three methods are rooted in the idea that segmentation and recognition are actually the two aspects of the same problem, and that the solution to either one of them will lead to the solution of the other. This is novel to most methods in the literature, which separate segmentation and recognition into two phases, and perform segmentation before recognition by looking into abrupt motion feature changes. The performance of the methods has been evaluated and compared on two types of gestures: two arms movement and a single hand movement. Experimental results have shown that all three methods achieved high recognition rates, ranging from 88% to 96% for upper body gestures, with the last one outperforming the other two. The single hand experiment also suggested that the proposed method has the potential to be applied to the application of continuous sign language recognition.