next up previous contents
Next: Symbolicsubsymbolic, and Up: Cognition Previous: Cognition

Cognition in Humans

It is essential to be aware of the fact that the ``raw data'' coming from the senses are not at the epistemological level of interest in . Taking for granted the existing and well-known ``unimodal'' psychophysical constraints at this level, it is what happens at the higher levels of ``information integration in bimodality'' which needs our focused attention. The whole issue of sensory-motor channel capacities which originates in the field of psychophysics (Weber's law, Fechner's law, Stevens's law from which the channel capacity data are obtained) is virtually useless within , because it considers only a single channel at a time, in an isolated manner which tends to greatly underestimate the global or Gestalt-like features of the human perceptual-motor system. The main rationale of is to exploit the (hidden) synergies between different channels in a ``natural'' environment (see the concept of ``ecological perception'' advocated by J.J. Gibson [122]).

Modality, in the neurobiological literature, implies the adjective ``sensory''. So it is not possible to speak of motor modalities. As mentioned earlier (Table 1.1 ), the classification among different modalities is mainly in terms of different forms of energies, which are transduced and detected. A similar classification can be applied to motor control aspects, which produce different kinds of work (mechanical work, acoustic work, chemical energy, etc.). However, these distinctions are physical and do not capture the informational aspect, which is tied to task and context, i.e., the meaning of an event or action. For example, a sound (emitted and/or detected) can be an utterance (only relevant for its timing), a musical tune (with its pitch/duration/intensity/timbre), or a spoken word. It can be man-generated or machine-generated and, dually, machine-perceived or man-perceived. Similarly, the term ``gesture'' is too vague and strongly task-dependent. On the other hand, the notion of channel is too restrictive and inappropriate when dealing with multimodal interaction, because the whole business of exploring multimodality is that in biology the ensemble is much more than the pure sum of the parts: emergent properties and functionalities can emerge if the parts are carefully matched.

As a consequence, the attempt to classify input/output devices is an exercise in futility if it is not grounded in a specific context, i.e. is not made task-dependent. Therefore, the taxonomy document should terminate with a preliminary sketch of the different application paradigms.

In order to structure the concepts in this area, the following representation levels are proposed [183], in increasing order of abstraction:

A signal refers to the N-dimensional waveform representation of a modality. It is characterized by spectral content, and a required sampling frequency and resolution can be identified. The signal directly corresponds to a physical entity in a quantitative fashion. In the sound modality, signals refer to the acoustical or waveform representation of a sound. In computer models, signals are digitally representated by an array of numbers. In audio, for CD-quality, a sampling rate of 44100 sa/sec and 16 bit resolution is often used. In music research, it is sometimes necessary to perform classical digital signal processing operations on musical signals, such as fourier transform or wavelet transform (see for example [80]).
Perceptual Mappings
A perceptual mapping represents transformed relevant aspects of a Signal in a condensed representation. In this framework, a Mapping is assumed to be a state or snapshot of the neural activity in a brain region during a defined time interval. It is modelled as an ordered array of numbers (a vector). For example, the most complete auditory mapping (i.e., the one closest to the Signal) is assumed to occur at the level of the auditory nerve. From this mapping all other mappings can be derived. For example, at the cochlear nucleus auditory, processing becomes differentiated and more specialized and some neurons perform onset detection. According to [182], Schemata and Mental Representations should be taken into account for a classification of auditory mappings (indeed, this classification scheme can be generalized also to other modalities).
A schema is a categorical information structure which reflects the learned functional organization of neurons as response structure. As a control structure it performs activity to adapt itself and guide perception. Basic schemata features are presented in [57].

Schemata are multifunctional. In the present framework, adaptation to the environment is seen as a long-term process taking several years. It is data-driven because no schema is needed in adapting other schemata to the environment. Long-term data-driven adaptation is distinguished from short-term schema-driven control. The latter is a short-term activity (e.g. 3 to 5 sec) responsible for recognition and interpretation. It relies on the schema and is therefore called schema-driven.

Schema responses to signal-based auditory maps are also considered maps. In that sense, the response of a schema to an map is also a map. But the schema has an underlying response and control structure which is more persistent than maps. The structure contained in a schema is long-term, while the information contained in a map is short-term. The latter is just a snapshot of an information flow.

As observed by Leman [182], neurophysiological studies provide evidence for the existence of different kinds of schemata (e.g., [328]). An example of schemata in auditory modeling is Leman's two-dimensional array of artificial neurons in a Kohonen-type network [182]. This schema has been trained with short pieces of music and is able to classify tone centers and chords in input signals.

Mental Representations
Mental representations are knowledge structures that refer to a ``mental'' world. They are used in solving specific tasks. Techniques of multi-dimensional scaling depict the data of the tests as mental spaces --- with both analogical and topological properties. Schemata are computer implementations of mental representations. The latter can serve as metaphors for the schemata. One of the aims of modelling perception and cognition is to show that the representational categories are causally related to each other. Signals are transformed into maps, and maps organize into schemata and are controlled by these schemata. By looking for correlations between the model and psychological data, one may try to relate the cognitive brain maps to mental representations. As such, the notion of mental representation can be incorporated in the knowledge-base. It is important to mention that this approach offers, contrary to the static world of mental representations, a dynamic point of view. This dynamics is introduced at two levels: (i) how the cognitive map comes into existence, (ii) how it is used in a perception task. The categories have been useful in a study on tone semantics [183,57] and are part of the present framework of a hybrid AI-based signal manipulation system.

The following of this section describes in more detail the basic requirements of representation and reasoning systems able to model these high-level aspects of cognition. This viewpoint reflects the fact that the taxonomy should not reduce only to the I/O channels.

next up previous contents
Next: Symbolicsubsymbolic, and Up: Cognition Previous: Cognition

Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995