The Multimodal Orchestra


A possible application scenario which can be relevant for entertainment, education, and clinical evaluation, is one in which the user is immersed in a multimodal experience but not a conventional virtual reality that he alone can perceive. Rather, we can think of an audio-visual environment which can be communicated to other humans, either other actors participating in the same event or external spectators of the action. For example, we can think of a sound/music/light/image/speech synthesis system which is driven/tuned by the movements of the actors using specific metaphors for reaching, grasping, turning, pushing, navigating, etc. Specific scenarios could regard:

Multimodality technology requires an instrumented environment, instrumented dresses, and instrumented tools which allow to capture natural human movements. Learning problems have two sides and at least some of them can be translated to a dimensionality-reduction paradigm:
  1. on the human side, like when learning to drive a car, there is the need to perceive the complex but smooth and continuous associations between movements and actions, avoiding cognitive bottlenecks which may arise with discontinuous and/or symbolic feedback;
  2. on the computer side, there is the dimensionality-reduction problem of extracting "principal components" from a high-dimensional space of redundant degrees of freedom and this may or may not imply the recognition of gestures.
In any case, we need learning because in multimodal systems there is not, in general, a simple one-to-one translation of signals and events as in the VR systems.

