next up previous contents
Next: Handwriting Recognition/Speech Synthesis: Up: Handwriting-speech control Previous: Automatic recognition and

Handwriting Recognition/Speech Recognition: Improved text entry

The reliability of recognition systems in the isolated speech or handwriting modality is improved due to the complementary (orthogonal) properties of both human output channels.

(a) hand/arm musculature: position and compliance control
(b) speech musculature, vocal chords, respiratory system: vocal sound production
(a) XY digitizer, handwriting recognition algorithm
(b) Microphone, speech recognition algorithm
Text (character fonts) are presented on the CRT
The user gets:
(a) Immediate feedback on speech and handwriting by the intrinsic feedback loop (Figure 1.1 )
(b) Mostly visual feedback in the form of text

An excellent overview on on-line handwriting recognition is given in [335]. A comparison between two approaches in on-line cursive script recognition is given in [301].

As regards the algorithmic architectures in integrating handwriting and speech recognition, similar problems as in merging speech recognition with facial speech movements occur. Several forms of merging recognizer output may be considered, of which two will be given:

  1. Merging of final output word list on the basis of rank order
  2. Merging of an intermediate character search space
Technically, (1) is easiest to implement, but it does not make use of the fact that ``the other modality'' may fill in ambiguous fragments in the character search space of a given modality. Therefore, merging of hypothesis search spaces as in (2) is the more powerful method. However, since the mapping from phonemes to character is not one-to-one, this is not a trivial task. Within , we will try to solve the problem by searching for common singularities in both modalities. As an example, the silence preceding the ``/ba/'' sound is a relatively easy speech feature to detect, as is the large ascending stroke of the written <b>.

Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995