A Speech Enhancement System Based on Statistical and Acoustic-Phonetic Knowledge
MetadataShow full item record
Noise reduction aims to improve the quality of noisy speech by suppressing the background noise in the signal. However, there is always a tradeoff between noise reduction and signal distortion--more noise reduction is always accompanied by more signal distortion. An evaluation of the intelligibility of speech processed by several noise reduction algorithms in  showed that most noise reduction algorithms were not successful in improving the intelligibility of noisy speech. In this thesis, we aim to utilize acoustic-phonetic knowledge to enhance the intelligibility of noise-reduced speech. Acoustic-phonetics studies the characteristics of speech and the acoustic cues that are important for speech intelligibility. We considered the following questions: what is the noise reduction algorithm that we should use, what are the acoustic cues that should be targeted, and how to incorporate this information into the design of the noise reduction system. A Bayesian noise reduction method similar to the one proposed by Ephraim and Malah in  is employed. We first evaluate the goodness-of-fit of several parametric PDF models to the empirical speech data. For classified speech, we find that the Rayleigh and Gamma. with a fixed shape parameter of 5, model the speech spectral amplitude equally well. The Gamma-MAP and Gamma-MMSE estimators are derived. The subjective and objective performances of these estimators are then compared. We also propose to apply a class-based cue-enhancement, similar to those performed in . The processing directly manipulates the acoustic cues known to be important for speech intelligibility. We assume that the system has the sound class information of the input speech. The scheme aims to enhance the interclass and intraclass distinction of speech sounds. The intelligibility of speech processed by the proposed system is then compared to the intelligibility of speech processed by the Rayleigh-MMSE estimator  The intelligibility evaluation shows that the proposed scheme enhances the detection of plosive and fricative sounds. However, it does not help in the intraclass discrimination of plosive sounds, and more tests need to be done to evaluate whether intraclass discrimination of fricatives is improved. The proposed scheme deteriorates the detection of nasal and affricate sounds.