next up previous contents
Next: Automatic visual speech Up: Audio-Visual Speech Recognition Previous: Audio-Visual Speech Recognition

Audio-visual speech perception by humans

Research with human subjects has shown that visible information of the talker's face provides extensive benefit to speech recognition in difficult listening conditions [330,21]. Studies have shown that visual information from a speaker's face is integrated with auditory information during phonetic perception. The McGurk effect demonstrates that an auditory /ba/ presented with a video /ga/ produces the perception of /da/ [220]. It indicates that the perceived place of articulation can be influenced by the visual cues. Other researchers have shown that bilabials and dentals are more easily perceived visually than alveolar and palatals [240]. All these experiments have demonstrated that speech perception is bimodal for all normal hearers, not just for profoundly deafs, as formally noticed by Cotton in 1935 [73]. For more an extended review of the intelligibility of audio-visual speech by humans, see the ICP-MIAMI 94-1 report by Benoit. For detailed information on the integration of such auditory and visual information by humans, see the ICP-MIAMI 94-2 report by Robert-Ribes.

Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995