Blind Estimation of Perceptual Quality for Modern Speech Communications

Loading...
Thumbnail Image

Authors

Falk, Tiago

Date

2009-01-05T22:08:01Z

Type

thesis

Language

eng

Keyword

Quality Estimation , Gaussian Mixture Models , Hidden Markov Model , Modulation Spectrum , Wireless Communications , Wireless-VoIP , Reverberation , Text-to-Speech

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Modern speech communication technologies expose users to perceptual quality degradations that were not experienced earlier with conventional telephone systems. Since perceived speech quality is a major contributor to the end user's perception of quality of service, speech quality estimation has become an important research field. In this dissertation, perceptual quality estimators are proposed for several emerging speech communication applications, in particular for i) wireless communications with noise suppression capabilities, ii) wireless-VoIP communications, iii) far-field hands-free speech communications, and iv) text-to-speech systems. First, a general-purpose speech quality estimator is proposed based on statistical models of normative speech behaviour and on innovative techniques to detect multiple signal distortions. The estimators do not depend on a clean reference signal hence are termed ``blind." Quality meters are then distributed along the network chain to allow for both quality degradations and quality enhancements to be handled. In order to improve estimation performance for wireless communications, statistical models of noise-suppressed speech are also incorporated. Next, a hybrid signal-and-link-parametric quality estimation paradigm is proposed for emerging wireless-VoIP communications. The algorithm uses VoIP connection parameters to estimate a base quality representative of the packet switching network. Signal-based distortions are then detected and quantified in order to adjust the base quality accordingly. The proposed hybrid methodology is shown to overcome the limitations of existing pure signal-based and pure link parametric algorithms. Temporal dynamics information is then investigated for quality diagnosis for hands-free speech communications. A spectro-temporal signal representation, where speech and reverberation tail components are shown to be separable, is used for blind characterization of room acoustics. In particular, estimators of reverberation time, direct-to-reverberation energy ratio, and reverberant speech quality are developed. Lastly, perceptual quality estimation for text-to-speech systems is addressed. Text- and speaker-independent hidden Markov models, trained on naturally produced speech, are used to capture normative spectral-temporal information. Deviations from the models, computed by means of a log-likelihood measure, are shown to be reliable indicators of multiple quality attributes including naturalness, fluency, and intelligibility.

Description

Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2008-12-22 14:54:49.28

Citation

Publisher

License

This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN