Next: Integration Models of Up: No Title Previous: Audio-visual speech synthesis

Audio-Visual Speech Recognition

Automatic speech recognition (ASR) promises to be of great importance in human-machine interfaces, but despite extensive effort over decades, acoustic-based recognition systems remain too inaccurate for the vast majority of conceivable applications, especially those in noisy environments (automobiles, factory floors, crowded offices, etc.). While incremental advances may be expected along the current ASR paradigm, additional, novel approaches --- in particular those utilizing visual information as well --- deserve serious study. Such hybrid (acoustic and visual) ASR systems have already been shown to have superior recognition accuracy, especially in noisy conditions, just as humans recognize speech better when also given visual information (the ``cocktail party effect'').

Integration Models of Audio-Visual Speech by Humans
Audio-Visual Speech Recognition by Machines

Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995