Queen's University - Utility Bar

QSpace at Queen's University >
Theses, Dissertations & Graduate Projects >
Queen's Theses & Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1974/1229

Title: Auditory domain speech enhancement
Authors: Yang, Xiaofeng

Files in This Item:

File Description SizeFormat
Thesis_xiaofeng_yang_2008_05_28.pdf2.24 MBAdobe PDFView/Open
Keywords: Speech enhancement
Musical noise
Gammatone filter
Meddis inner hair cell model
Cochleagram
Auditory grouping
Perception
Issue Date: 2008
Series/Report no.: Canadian theses
Abstract: Many speech enhancement algorithms suffer from musical noise - an estimation residue noise consisting of music-like varying tones. To reduce this annoying noise, some speech enhancement algorithms require post-processing. However, a lack of auditory perception theories about musical noise limits the effectiveness of musical noise reduction methods. Scientists now have some understanding of the human auditory system, thanks to the advances in hearing research across multiple disciplines - anatomy, physiology, psychology, and neurophysiology. Auditory models, such as the gammatone filter bank and the Meddis inner hair cell model, have been developed to simulate the acoustic to neuron transduction process. The auditory models generate the neuron firing signals called the cochleagram. Cochleagram analysis is a powerful tool to investigate musical noise. We use auditory perception theories in our musical noise investigations. Some auditory perception theories (e.g., volley theory and auditory scene analysis theories) suggest that speech perception is an auditory grouping process. Temporal properties of neuron firing signals, such as period and rhythm, play important roles in the grouping process. The grouping process generates a foreground speech stream, a background noise stream, and possibly additional streams. We assume that musical noise is the result of grouping to the background stream the neuron firing signals whose temporal properties are different from the ones grouped to the foreground stream. Based on this hypothesis, we believe that a musical noise reduction method should increase the probability of grouping the enhanced neuron firing signals to the foreground speech stream, or decrease the probability of grouping them into the background stream. We propose a post-processing musical noise reduction method for the auditory Wiener filter speech enhancement method, in which we employ a proposed complex gammatone filter bank for the cochlear decomposition. The results of a subjective listening test of our speech enhancement system show that the proposed musical noise reduction method is effective.
Description: Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2008-05-28 16:11:28.374
URI: http://hdl.handle.net/1974/1229
Appears in Collections:Queen's Theses & Dissertations
Computing Graduate Theses

Items in QSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  DSpace Software Copyright © 2002-2008  The DSpace Foundation - TOP