next up previous contents
Next: Audio-Visual Speech Recognition Up: Audio-Visual Speech Synthesis Previous: Animation of synthetic

Audio-visual speech synthesis

Experiments on natural speech (see ICP-MIAMI report 94-1) allow us to anticipate that similar effects will be obtained with a TtAVS synthesizer: Even if the current quality of (most) TtS systems is not as bad as highly degraded speech, it is obvious that under very quiet conditions, synthesizers are much less intelligible than humans. Moreover, it is realistic to predict that in the near future, the spread of speech synthesizers will lead to wide use in noisy backgrounds, such as in railway stations. Such adverse conditions will necessitate a synchronized presentation of the information from another modality, for instance, the orthographic display of the text, or the animation of a synthetic face (especially for foreigners and illitirates). There are hence several reasons for the study and use of Audio-Visual Speech Synthesis.

Unfortunately, most of the authors only reported informal impressions from colleagues about the quality of their system, but as far as I am aware none of them has ever quantified the improvement in intelligibility given by adding visual synthesis to the acoustic waveflow. I strongly support the idea that assessment methodologies should be standardized so that the various approaches can be compared to one another. Next report will present results of intelligibility tests run at the ICP with various visual (natural and synthetic) displays of the lips, the jaw and the face under different condition of background noise added to the original acoustic signal.



next up previous contents
Next: Audio-Visual Speech Recognition Up: Audio-Visual Speech Synthesis Previous: Animation of synthetic



Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995