Queen's University - Utility Bar

QSpace at Queen's University >
Theses, Dissertations & Graduate Projects >
Queen's Theses & Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1974/1642

Title: Blind Estimation of Perceptual Quality for Modern Speech Communications
Authors: Falk, Tiago

Files in This Item:

File Description SizeFormat
Falk_Tiago_H_200812_PhD.pdf1.38 MBAdobe PDFView/Open
Keywords: Quality estimation
Gaussian mixture models
hidden Markov model
modulation spectrum
wireless communications
wireless-VoIP
reverberation
text-to-speech
Issue Date: 2009
Series/Report no.: Canadian theses
Abstract: Modern speech communication technologies expose users to perceptual quality degradations that were not experienced earlier with conventional telephone systems. Since perceived speech quality is a major contributor to the end user's perception of quality of service, speech quality estimation has become an important research field. In this dissertation, perceptual quality estimators are proposed for several emerging speech communication applications, in particular for i) wireless communications with noise suppression capabilities, ii) wireless-VoIP communications, iii) far-field hands-free speech communications, and iv) text-to-speech systems. First, a general-purpose speech quality estimator is proposed based on statistical models of normative speech behaviour and on innovative techniques to detect multiple signal distortions. The estimators do not depend on a clean reference signal hence are termed ``blind." Quality meters are then distributed along the network chain to allow for both quality degradations and quality enhancements to be handled. In order to improve estimation performance for wireless communications, statistical models of noise-suppressed speech are also incorporated. Next, a hybrid signal-and-link-parametric quality estimation paradigm is proposed for emerging wireless-VoIP communications. The algorithm uses VoIP connection parameters to estimate a base quality representative of the packet switching network. Signal-based distortions are then detected and quantified in order to adjust the base quality accordingly. The proposed hybrid methodology is shown to overcome the limitations of existing pure signal-based and pure link parametric algorithms. Temporal dynamics information is then investigated for quality diagnosis for hands-free speech communications. A spectro-temporal signal representation, where speech and reverberation tail components are shown to be separable, is used for blind characterization of room acoustics. In particular, estimators of reverberation time, direct-to-reverberation energy ratio, and reverberant speech quality are developed. Lastly, perceptual quality estimation for text-to-speech systems is addressed. Text- and speaker-independent hidden Markov models, trained on naturally produced speech, are used to capture normative spectral-temporal information. Deviations from the models, computed by means of a log-likelihood measure, are shown to be reliable indicators of multiple quality attributes including naturalness, fluency, and intelligibility.
Description: Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2008-12-22 14:54:49.28
URI: http://hdl.handle.net/1974/1642
Appears in Collections:Queen's Theses & Dissertations
Electrical and Computer Engineering Graduate Theses

Items in QSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  DSpace Software Copyright © 2002-2008  The DSpace Foundation - TOP