Single-Microphone Speech Dereverberation: Modulation Domain Processing and Quality Assessment

Loading...
Thumbnail Image
Date
2011-07-25
Authors
Zheng, Chenxi
Keyword
Speech enhancement , Dereverberation , Speech signal processing
Abstract
In a reverberant enclosure, acoustic speech signals are degraded by reflections from walls, ceilings, and objects. Restoring speech quality and intelligibility from reverberated speech has received increasing interest over the past few years. Although multiple channel dereverberation methods provide some improvements in speech quality/ intelligibility, single-channel dereverberation remains an open challenge. Two types of advanced single-channel dereverberation methods, namely acoustic domain spectral subtraction and modulation domain filtering, provide small improvement in speech quality and intelligibility. In this thesis, we study single-channel dereverberation algorithms. Firstly, an upper bound of time-frequency masking (TFM) performance for dereverberation is obtained using ideal time-frequency masking (ITFM). ITFM has access to both the clean and reverberated speech signals in estimating the binary-mask matrix. ITFM implements binary masking in the short time Fourier transform (STFT) domain, preserving only those spectral components less corrupted by reverberation. The experiment results show that single-channel ITFM outperforms four existing multi-channel dereverberation methods and suggest that large potential improvements could be obtained using TFM for speech dereverberation. Secondly, a novel modulation domain spectral subtraction method is proposed for dereverberation. This method estimates modulation domain long reverberation spectral variance (LRSV) from time domain LRSV using a statistical room impulse response (RIR) model and implements spectral subtraction in the modulation domain. On one hand, different from acoustic domain spectral subtraction, our method implements spectral subtraction in the modulation domain, which has been shown to play an important role in speech perception. On the other hand, different from modulation domain filtering which uses a time-invariant filter, our method takes the changes of reverberated speech spectral variance along time into account and implements spectral subtraction adaptively. Objective and informal subjective tests show that our proposed method outperforms two existing state-of-the-art single-channel dereverberation algorithms.
External DOI