Next: Symbolicsubsymbolic, and
Up: Cognition
Previous: Cognition
It is essential to be aware of the fact that the ``raw data'' coming from
the senses are not at the epistemological level of interest in .
Taking for granted the existing and well-known ``unimodal'' psychophysical
constraints at this level, it is what happens at the higher levels of
``information integration in bimodality'' which needs our focused
attention. The whole issue of sensory-motor channel capacities which
originates in the field of psychophysics (Weber's law, Fechner's law,
Stevens's law from which the channel capacity data are obtained) is
virtually useless within , because it considers only a single channel
at a time, in an isolated manner which tends to greatly underestimate the
global or Gestalt-like features of the human perceptual-motor system. The
main rationale of is to exploit the (hidden) synergies between
different channels in a ``natural'' environment (see the concept of
``ecological perception'' advocated by J.J. Gibson [122]).
Modality, in the neurobiological literature, implies the adjective
``sensory''. So it is not possible to speak of motor modalities. As
mentioned earlier (Table 1.1 ), the classification among
different modalities is mainly in terms of different forms of energies,
which are transduced and detected. A similar classification can be applied
to motor control aspects, which produce different kinds of work (mechanical
work, acoustic work, chemical energy, etc.). However, these distinctions
are physical and do not capture the informational aspect, which is tied to
task and context, i.e., the meaning of an event or action. For example, a
sound (emitted and/or detected) can be an utterance (only relevant for its
timing), a musical tune (with its pitch/duration/intensity/timbre), or a
spoken word. It can be man-generated or machine-generated and, dually,
machine-perceived or man-perceived. Similarly, the term ``gesture'' is too
vague and strongly task-dependent. On the other hand, the notion of channel
is too restrictive and inappropriate when dealing with multimodal
interaction, because the whole business of exploring multimodality is that
in biology the ensemble is much more than the pure sum of the parts:
emergent properties and functionalities can emerge if the parts are
carefully matched.
As a consequence, the attempt to classify input/output devices is an
exercise in futility if it is not grounded in a specific context, i.e. is
not made task-dependent. Therefore, the taxonomy document should terminate
with a preliminary sketch of the different application paradigms.
In order to structure the concepts in this area, the following
representation levels are proposed [183], in increasing order of
abstraction:
- Signals
- A signal refers to the N-dimensional waveform representation
of a modality. It is characterized by spectral content, and a required
sampling frequency and resolution can be identified. The signal directly
corresponds to a physical entity in a quantitative fashion. In the sound
modality, signals refer to the acoustical or waveform representation of a
sound. In computer models, signals are digitally representated by an
array of numbers. In audio, for CD-quality, a sampling rate of
44100 sa/sec and 16 bit resolution is often used. In music research, it
is sometimes necessary to perform classical digital signal processing
operations on musical signals, such as fourier transform or wavelet
transform (see for example [80]).
- Perceptual Mappings
- A perceptual mapping represents transformed
relevant aspects of a Signal in a condensed representation. In this
framework, a Mapping is assumed to be a state or snapshot of the neural
activity in a brain region during a defined time interval. It is modelled
as an ordered array of numbers (a vector). For example, the most
complete auditory mapping (i.e., the one closest to the Signal) is
assumed to occur at the level of the auditory nerve. From this mapping
all other mappings can be derived. For example, at the cochlear nucleus
auditory, processing becomes differentiated and more specialized and some
neurons perform onset detection. According to [182], Schemata and
Mental Representations should be taken into account for a classification
of auditory mappings (indeed, this classification scheme can be
generalized also to other modalities).
- Schemata
- A schema is a categorical information structure which
reflects the learned functional organization of neurons as response
structure. As a control structure it performs activity to adapt itself
and guide perception. Basic schemata features are presented
in [57].
Schemata are multifunctional. In the present framework, adaptation to the
environment is seen as a long-term process taking several years. It is
data-driven because no schema is needed in adapting other schemata to the
environment. Long-term data-driven adaptation is distinguished from
short-term schema-driven control. The latter is a short-term activity
(e.g. 3 to 5 sec) responsible for recognition and interpretation. It
relies on the schema and is therefore called schema-driven.
Schema responses to signal-based auditory maps are also considered maps.
In that sense, the response of a schema to an map is also a map. But the
schema has an underlying response and control structure which is more
persistent than maps. The structure contained in a schema is long-term,
while the information contained in a map is short-term. The latter is
just a snapshot of an information flow.
As observed by Leman [182], neurophysiological studies provide
evidence for the existence of different kinds of schemata (e.g.,
[328]). An example of schemata in auditory modeling is Leman's
two-dimensional array of artificial neurons in a Kohonen-type
network [182]. This schema has been trained with short pieces of
music and is able to classify tone centers and chords in input signals.
- Mental Representations
- Mental representations are knowledge
structures that refer to a ``mental'' world. They are used in solving
specific tasks. Techniques of multi-dimensional scaling depict the data
of the tests as mental spaces --- with both analogical and topological
properties. Schemata are computer implementations of mental
representations. The latter can serve as metaphors for the schemata. One
of the aims of modelling perception and cognition is to show that the
representational categories are causally related to each other. Signals
are transformed into maps, and maps organize into schemata and are
controlled by these schemata. By looking for correlations between the
model and psychological data, one may try to relate the cognitive brain
maps to mental representations. As such, the notion of mental
representation can be incorporated in the knowledge-base. It is
important to mention that this approach offers, contrary to the static
world of mental representations, a dynamic point of view. This dynamics
is introduced at two levels: (i) how the cognitive map comes into
existence, (ii) how it is used in a perception task. The categories have
been useful in a study on tone semantics [183,57] and are
part of the present framework of a hybrid AI-based signal manipulation
system.
The following of this section describes in more detail the basic
requirements of representation and reasoning systems able to model these
high-level aspects of cognition. This viewpoint reflects the fact that the
taxonomy should not reduce only to the I/O channels.
Next: Symbolicsubsymbolic, and
Up: Cognition
Previous: Cognition
Esprit Project 8579/MIAMI (Schomaker et al., '95)
Thu May 18 16:00:17 MET DST 1995