Temporal coherence

Next: Source coherence Up: The need for Previous: Spatial coherence

Temporal coherence

The second problem which arises from the bimodal aspect of speech perception is due to the inherent synchrony between acoustically and optically transmitted information. Dixon and Spitz have experimentally observed that subjects were unable to detect asynchrony between visual and auditory presentation of speech when the acoustic signal was presented less than 130 ms before or 260 ms after the continuous video display of the speakers face [83].

Mainly motivated by the applied problem of speech perception through the visiophone (where the unavoidable image coding/decoding process delays the transmission of optical information), recent studies tried to quantify the loss of intelligibility due to delayed visual information.

For example, Smeele and Sittig measured the intelligibility of phonetically balanced lists of nonsense CVC words acoustically degraded by a background interfering prose [315]. They measured a mean intelligibility of 20% in the auditory alone condition of presentation and of 65% in the audio-visual condition. However, if the facial presentation was delayed more than 160 ms after the corresponding audio signal, there was no significant improvement of audio-visual presentation over audio alone. In the other direction, Smeele (personal communication) more recently observed a rather constant intelligibility of around 40% when speech was presented in a range of 320 to 1500 ms after vision. In a similar experiment, Campbell and Dodd had previously discovered that the disambiguation effects of speech-reading on noisy isolated words were observed with durations of up to 1.5 sec desynchrony between seen and heard speech, but they indicated that this benefit occurred whichever modality was leading [54]. On the other hand, Reisberg et al. failed to observe any visual benefit in a shadowing task using the above mentioned text by Kant with modalities desynchronized at 500 ms [286]. These somewhat divergent findings strongly support the idea that audition and vision influence each other in speech perception, even if the extent of the phenomenon is yet unclear (i.e., does it operate on the acoustic feature, the phoneme, the word, the sentence, etc.?) and even if the role of short-term memory, auditory and visual, in their integration remains a mystery. I would simply suggest that the benefit of speech-reading is a function not only of the acoustic degradation, but also of the linguistic complexity of the speech material. The greater the redundancy (from nonsense words to running speech through isolated words), the more the high-level linguistic competence is solicited (in order to take advantage of the lexicon, syntax, semantics, etc., in a top-down process), and the more this cognitive strategy dominates the low-level bottom-up decoding process of speech-reading.

Next: Source coherence Up: The need for Previous: Spatial coherence

Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995