Multimodal vs. multimedia vs. virtual reality system


Notice the following paragraph in [243]:

`` Multimodal System: A Definition
In the general sense, a multimodal system supports communication with the user through different modalities such as voice, gesture, and typing. Literally, `multi' refers to `more than one' and the term `modal' may cover the notion of `modality' as well as that of `mode'. In a communication act, whether it be between humans or between a computer system and a user, both the modality and the mode come into play. The modality defines the type of data exchanged whereas the mode determines the context in which the data is interpreted. Thus, if we take a system-centered view, multimodality is the capacity of the system to communicate with a user along different types of communication channels and to extract and convey meaning automatically. We observe that both multimedia and multimodal systems use multiple communication channels. But in addition, a multimodal system is able to automatically model the content of the information at a high level of abstraction. A multimodal system strives for meaning.''
In , we decided to agree with this perspective. Especially the common usage of a multimedia system, which is often merely a PC with a sound card, a loudspeaker, and a CD ROM (if the hardware is considered) or a kind of hypertext/hypermedia environment with a combination of text, video, and sound (if the software is considered) is not what we want to address in this project. Such a kind of explanation of the term multimedia is given by Steinmetz:
`` Multimedia, from the user's point of view, means that information can also be represented as audio signals or moved images.''
[322, page 1,] (translation by the authors)
A definition given by Buxton states the following:
``[...] 'Multimedia' focuses on the medium or technology rather than the application or user.''
[195, page 2,]
The second aspect, namely to concentrate on the user, is what we want to do in this project. The first aspect, the concentration on the application, is characteristic for research performed in the VR domain. Therefore, a distinction between a multimodal and a Virtual Reality (VR) system has to be made. A 'minimal' definition of a VR system is provided by Gigante:
``The illusion of participation in a synthetic environment rather than external observation of such an environment. VR relies on three-dimensional (3D), stereoscopic, head-tracked displays, hand/body tracking and binaural sound. VR is an immersive, multisensory experience.''
[86, page 3,]
In this project, we understand that the main difference is the intention behind both research directions: VR aims at the imitation of reality for establishing immersive audio-visual illusions, whereas multimodality attempts to enhance the throughput and the naturalism of man-machine communication. The audio-visual illusion of VR is just a trick for triggering the natural synergy among sensory-motor channels which is apparent in our brain but is not the only possibility of efficient exploitation of the parallelism and associative nature of human perceptual-motor processes. From this point of view, VR research is a subset of multimodality research. For example, in a following section we suggest two views of multimodal systems, computer-as-tool and computer-as-dialogue-partner, which are a further alternatives to the computer-as-audiovisual-illusion which is typical of VR systems. A discussion of VR will follow in section 4.4 .

Having in mind our basic model as well as Nigay and Coutaz' statement that ``A multimodal system strives for meaning.'', one aspect that distinguishes our research from both, multimedia and VR research, is the concentration on the internal processing steps, known as cognition. Whereas synchronization of different modalities (or media) plays a key role in a multimedia system, this won't be enough for a multimodal system. The fusion of modalities as well as of cognitive processes has to be considered in order to find the meaning or the intention of the user's actions (see also 1.1.2 and 1.2.2 ). This aspect of multimodality will be further investigated in section 5 .

