QSpace at Queen's University >
Graduate Theses, Dissertations and Projects >
Queen's Graduate Theses and Dissertations >
Please use this identifier to cite or link to this item:
|Title: ||Recognition of Human Emotion in Speech Using Modulation Spectral Features and Support Vector Machines|
|Authors: ||Wu, Siqing|
|Keywords: ||Emotion recognition|
|Issue Date: ||2009|
|Series/Report no.: ||Canadian theses|
|Abstract: ||Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. The area has received rapidly increasing research interest over the past few years. However, designing powerful spectral features for high-performance speech emotion recognition (SER) remains an open challenge. Most spectral features employed in current SER techniques convey short-term spectral properties only while omitting useful long-term temporal modulation information.
In this thesis, modulation spectral features (MSFs) are proposed for SER, with support vector machines used for machine learning. By employing an auditory filterbank and a modulation filterbank for speech analysis, an auditory-inspired long-term spectro-temporal (ST) representation is obtained, which captures both acoustic frequency and temporal modulation frequency components. The MSFs are then extracted from the ST representation, thereby conveying information important for human speech perception but missing from conventional short-term spectral features (STSFs).
Experiments show that the proposed features outperform features based on mel-frequency cepstral coefficients and perceptual linear predictive coefficients, two commonly used STSFs. The MSFs further render a substantial improvement in recognition performance when used to augment the extensively used prosodic features, and recognition accuracy above 90% is accomplished for classifying seven emotion categories. Moreover, the proposed features in combination with prosodic features attain estimation performance comparable to human evaluation for recognizing continuous emotions.|
|Description: ||Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-09-08 13:01:54.941|
|Appears in Collections:||Queen's Graduate Theses and Dissertations|
Department of Electrical and Computer Engineering Graduate Theses
Items in QSpace are protected by copyright, with all rights reserved, unless otherwise indicated.