next up previous contents
Next: Gesture Taxonomies Up: Audio-Visual Speech Recognition Previous: Results with degraded

Forecast for future works

The first results seem to be a great step towards designing an audio-visual speech recognition system but many problems still have to be solved. First of all, in our model, weighting factors of visual and auditory parameters have been neglected; so that the system still relies on unbalanced information. Second, other models for audio-visual integration have to be implemented in order to select the best way of combining audio and visual information in our case. For a complete on-line audio-visual speech recognition, we have planned to design a video analysis board capable of extracting the lip contours and computing the four necessary visual parameters in a short delay (< 40 msec). Then it will be possible to collect a much larger amount of audio-visual data within a decent period of time. It should allow the audio-visual speech recognizer to run in real-time. This would then be an ideal interface for bimodal speech dialogue on a multimedia platform.


Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995