The Role of Facial Gestural Information in Supporting Perceptual Learning of Degraded Speech

Thumbnail Image
Wayne, Rachel
Perceptual Learning , Degraded Speech , Cochlear Implants , Noise-vocoded Speech
Everyday speech perception frequently occurs in degraded listening conditions, against a background of noise, interruptions and intermingling voices. Despite these challenges, speech perception is remarkably successful, due in part to perceptual learning. Previous research has demonstrated more rapid perceptual learning of acoustically-degraded speech when listeners are given the opportunity to map the linguistic content of utterances, presented in clear auditory form, onto the degraded auditory utterance. Here, I investigate whether learning is further enhanced by the provision of naturalistic facial gestural information, presented concurrently with either the clear auditory sentence (Experiment I), or with the degraded utterance (Experiment II). Recorded materials were noise-vocoded (4 frequency channels; 50- 8000 Hz). Noise-vocoding (NV) is a popular simulation of speech transduced through a cochlear implant, and 4-channel NV speech is difficult for naïve listeners to understand, but can be learned over several sentences of practice. In Experiment I, each trial began with an auditory-alone presentation of a degraded stimulus for report (D). In two conditions, this was followed by passive listening to either the clear spoken form and then the degraded form again (condition DCD), or the reverse (DDC); the former format of presentation (DCD) results in more efficient learning (Davis et al, 2005). Condition DCvD was similar to DCD, except that the clear spoken form was accompanied by facial gestural information (a talking face). The results indicate that presenting clear audiovisual feedback (DCvD) does not confer any advantage over clear auditory feedback (DCD). In Experiment II, two groups received a degraded sentence presentation with corresponding facial movements (Dv); the second group also received a second degraded (auditory-alone) presentation (DvD). Two control conditions and a baseline DCvD condition were also tested. Although they never received clear speech feedback, performance in the DvD group was significantly greater than in all others, indicating that perceptual learning mechanisms can capitalize on visual concomitants of speech. The DvD group outperformed the Dv group, suggesting that the second degraded presentation in the DvD condition further facilitates generalization of learning. These findings have important implications for improving comprehension of speech in an unfamiliar accent or following cochlear implantation.
External DOI