We need formalisms able to manage the different structure and levels of abstraction of multimedia objects, from the symbolic, abstract representations to the subsymbolic representations (e.g., the signal perceived by the human ear).
Moreover, different views of the same objects are often necessary: according to the reasoning perspective and the goal to fulfill, multimedia object representations can vary from an atom in a symbolic high-level representation to a stream of low-level signals in the deepest view of the same material.
Metaphors are a crucial issue regarding representation and reasoning capabilities in multimedia systems. Metaphors are widely used in reasoning by humans and are at the basis of languages for the integration of different multimedia knowledge. For example, in the problem of a robot navigating in a three-dimensional space (e.g., a ``museal robot'' or a ``robotic actor'' on a theater stage), a bipolar force field can be a useful metaphor: in the mental model, a moving robot can correspond to an electric charge, a target to be reached corresponds to a charge of opposite sign, and obstacles correspond to charges of the same sign. Music languages are rich of metaphors derived from the real world dynamics --- see for example . In general, the terms and descriptions in one modality can be used to express intuitively ``similar'' concepts in other modalities. We deem that metaphors are the basic ``glue'' for integrating different modalities, e.g., sound/music and movement/dance representations. The issue of reasoning based on metaphors has been widely studied from different points of view in AI, psychology and philosophy. Steps toward an approach to model metaphors can be found for example in : his theory analyzes metaphors in terms of similarities of topological structures between dimensions in a conceptual space.
Furthermore, formalisms able to support users should provide mechanisms for reasoning on actions and plans, for analyzing alternatives, strategies, starting from user requirements and goals. They should provide both formal and informal analysis capabilities for inspecting the objects represented.
Another point is learning, i.e., how to automatically update the system knowledge (new analysis data, new planning strategies), for example by means of generalization processes starting from examples presented by the user. The solutions proposed in the AI literature, such as the purely symbolic approaches, and the learning systems based on neural networks, are preliminary attempts in this direction.
Lastly, an emerging aspect regards the modeling and communication of emotions in multimedia systems. Preliminary work is available in literature [252,288] and is currently experimented at Carnegie Mellon in the framework of interactive agent architectures .