Queen's University - Utility Bar

QSpace at Queen's University >
Graduate Theses, Dissertations and Projects >
Queen's Graduate Theses and Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1974/5144

Title: Recognition of Human Emotion in Speech Using Modulation Spectral Features and Support Vector Machines
Authors: Wu, Siqing

Files in This Item:

File Description SizeFormat
Wu_Siqing_200909_MSc.pdf2.55 MBAdobe PDFView/Open
Keywords: Emotion recognition
Speech modulation
Spectro-temporal representation
Affective computing
Issue Date: 2009
Series/Report no.: Canadian theses
Abstract: Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. The area has received rapidly increasing research interest over the past few years. However, designing powerful spectral features for high-performance speech emotion recognition (SER) remains an open challenge. Most spectral features employed in current SER techniques convey short-term spectral properties only while omitting useful long-term temporal modulation information. In this thesis, modulation spectral features (MSFs) are proposed for SER, with support vector machines used for machine learning. By employing an auditory filterbank and a modulation filterbank for speech analysis, an auditory-inspired long-term spectro-temporal (ST) representation is obtained, which captures both acoustic frequency and temporal modulation frequency components. The MSFs are then extracted from the ST representation, thereby conveying information important for human speech perception but missing from conventional short-term spectral features (STSFs). Experiments show that the proposed features outperform features based on mel-frequency cepstral coefficients and perceptual linear predictive coefficients, two commonly used STSFs. The MSFs further render a substantial improvement in recognition performance when used to augment the extensively used prosodic features, and recognition accuracy above 90% is accomplished for classifying seven emotion categories. Moreover, the proposed features in combination with prosodic features attain estimation performance comparable to human evaluation for recognizing continuous emotions.
Description: Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-09-08 13:01:54.941
URI: http://hdl.handle.net/1974/5144
Appears in Collections:Queen's Graduate Theses and Dissertations
Department of Electrical and Computer Engineering Graduate Theses

Items in QSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


  DSpace Software Copyright © 2002-2008  The DSpace Foundation - TOP