Handwriting-speech control

Next: Automatic recognition and Up: Bi- and Multimodal Previous: Handwriting-visual control

Handwriting-speech control

In bimodal handwriting & speech control, the user combines the Human Output Channels (HOCs) of speech and handwriting in a combined way to achieve a specific goal. A distinction must be made between textual input and command input (see Appendix E ). In textual input, the goal is to enter linguistic data into a computer system, either in the form of digitized signals, or as (ASCII)-coded strings. In command input, the user selects a specific command to be executed and adds arguments and qualifiers to it. The term handwriting in the title includes pen-gesture control for the current purpose. Handwriting & speech bimodality in the case of textual input means a potentially increased bandwidth and reliability, provided that the user is able to deal with the combined speech and pen control. Handwriting & speech bimodality in the case of command input allows for a flexible choice [99]. As an example, the user may say /erase/ and circle or tap an object with the pen (, i.e. erase ``this''). Alternatively, the user may draw a deletion gesture and say the name of an object to be deleted.

In the remainder of this section we will consider bimodality in speech and handwriting from two viewpoints: (i) the automatic recognition and artificial synthesis of these HOC data; and (ii), the mere storage and replay of these HOC data. The accent will be on ``Control'', but we have added some information on computer output media (COM), because of the often encountered confusion with respect to the concepts of recognition vs. synthesis. Furthermore, with speech, we mean the audio signal representing spoken text, with ink, we mean the XY-trajectory representing written text. Both signals are functions of time.

Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995