Sources and Correlates of Performance Enhancement in Audiovisual Speech Perception

Thumbnail Image
Nahanni, Celina
Speech-in-noise , Confusions , McGurk illusion , Audiovisual speech , Audiovisual integration , Integration enhancement , Word identification , Open-set
In a noisy environment, speech intelligibility is greatly enhanced by seeing the speaker’s face. This enhancement results from the integration of auditory and visual signals, but the underlying mechanisms remain largely unknown. This thesis describes the results from four studies investigating the enhancement of speech comprehension in an open-set, word-in-noise identification task. In the first study, we examined how the auditory signal-to-noise ratio impacts audiovisual performance in a large participant sample. Consistent with the majority of previous studies, we found high inter-individual variability, whose magnitude increased monotonically with increasing signal-to-noise ratio. We also found that audiovisual performance was highly variably across studies, even after normalizing with auditory-only performance, suggesting that experimental factors strongly affect performance. In the second study, we replicated the findings from the first study and developed a measure of audiovisual ‘integration enhancement’ that captures how much of the audiovisual performance cannot be accounted for by performances in visual-only and auditory-only tasks. Contrary to the principle of inverse effectiveness, this integration enhancement was found not to decrease with auditory signal-to-noise ratio but to peak at an intermediate ratio. In the third study, we developed a model based on the congruency of response errors (confusions) observed in auditory-only and visual-only task conditions to predict audiovisual performance enhancement. This relatively simple categorical model was found to account for audiovisual performance enhancement in 60% of our stimuli, lower level processing was likely responsible for the enhancement in the remainder of the words. In the fourth study, we investigated how much of the audiovisual performance enhancement can be predicted by the integration of auditory and visual signals, a process that we estimated from the susceptibility of one’s auditory perception to be altered by incongruent (McGurk) visual stimuli. Surprisingly, we found no correlation between the McGurk illusion susceptibility for nonsense syllables and our measure of audiovisual integration enhancement for words. Altogether, these findings suggest that a large amount of performance enhancement in speech comprehension is attributable to categorical constraints rather than to a discrete integration process. This work also highlights the importance of developing standardized approaches for future investigations of audiovisual speech perception.
External DOI