It is well known that lip-reading is necessary in order for the hearing impaired to (partially) understand speech, specifically by using the information recoverable from visual speech. But as early as 1935, Cotton  stated that ``there is an important element of visual hearing in all normal individuals''. Even if the auditory modality is the most important for speech perception by normal hearers, the visual modality may allow subjects to better understand speech. Note that visual information, provided by movements of the lips, chin, teeth, cheeks, etc., cannot, in itself, provide normal speech intelligibility. However, a view of the talker's face enhances spectral information that is distorted by background noise. A number of investigators have studied this effect of noise distortion on speech intelligibility according to whether the message is heard only, or heard with the speakers face also provided [330,241,21,93,95,331].
Figure 2.1 : Improved intelligibility of degraded speech through vision of the speakers face. The box indicates the mean, and the whiskers the standard deviation.
It is well known that information is more easily retained by an audience when transmitted over the television than over the radio. To confirm this, Reisberg et al.  reported that passages read from Kant's Critique of Pure Reason were better understood by listeners (according to the proportion of correctly repeated words in a shadowing task) when the speakers face was provided to them. Even if people usually do not speak the same way as Emmanuel Kant wrote, this last finding is a clear argument in favor of the general overall improvement of linguistic comprehension through vision. Therefore, it also allows us to better take into consideration the advantage of TtAVS synthesis for the understanding of automatically read messages, assuming that human-machine dialogue will be much more efficient under bimodal presentation of spoken information to the user. An average 11 dB ``benefit of lip-reading'' was found by MacLeod and Summerfield . This corresponds to the average difference between the lowest signal-to-noise ratios at which test sentences are understood, given presence or absence of visual information. This finding must obviously be tempered by the conditions of visual presentation. Ostberg et al. tested the effects of six sizes of videophone display on the intelligibility of noisy speech . They presented running speech to subjects who where asked to adjust the noise level so that the individual words in the story appeared at the borderline of being intelligible; they observed an increase in the mean benefit of lip-reading from 0.4 to 1.8 dB with the increase in display size. This observation confirms the intuitive idea that the better the visual information, the greater the improvement in intelligibility.