Back to CPSP-theory
written by: Peter van Hengel
date: May 10 2001
Coppyright: Sound Intelligence
The cochlea model performs a frequency analysis of the incoming sound (as given by the resampler). This frequency analysis could be compared to a short term FFT or wavelet analysis, but has substantial advantages over these. The most important advantage is that in the cochlea analysis continuity in both time and frequency can be guaranteed.
The code implemented here is based on a model of the human inner ear (=cochlea) developed in the early 1980's in Groningen. In the real cochlea the incoming sound is split into its contributing frequency components by a membrane (the basilar membrane or BM) which has varying mechanical properties along its length. Figure 1 shows the position of the cochlea and various related structures in the human head, as well as a cross-section of the cochlea, showing the position of the BM.
Figure 1a: Overview of the human hearing organ.
Figure 1b: Cross-section of the cochlea (at the plane indicated in the inset). 1: scala media, 2: scala vestibuli, 3: scala tympani, 4: spiral ganglion, 5: auditory nerve. The BM separates the scala media from scala tympani.
The variation in mechanical properties along the BM causes high frequencies
to be expressed at the start of the cochlea (base) and lower frequencies
further towards the end (apex).
This leads to a place-frequency-relation where every frequency (in the
range of hearing) has a corresponding position in the cochlea that will
react maximally to that frequency. Figure 2 shows the BM with some
frequencies at the corresponding locations. Figure 2: Relation between BM position and frequency. For further details
on the human cochlea see any book on hearing (e.g. , , or the www-site
round the Cochlea"). Transmission line
The variation in mechanical properties along the BM causes high frequencies to be expressed at the start of the cochlea (base) and lower frequencies further towards the end (apex). This leads to a place-frequency-relation where every frequency (in the range of hearing) has a corresponding position in the cochlea that will react maximally to that frequency. Figure 2 shows the BM with some frequencies at the corresponding locations.
Figure 2: Relation between BM position and frequency.
For further details on the human cochlea see any book on hearing (e.g. , , or the www-site "Promenade round the Cochlea").
The "Groningen" model is a so-called transmission line model of the cochlea, as opposed to the more common filterbank model. In a filterbank model the incoming sound is processed by a (large) number of separate filters, each representing a section of the real cochlea. These filters will in general have overlapping frequency responses, but the motion in one filter does not affect the motion in any of the other filters. This implies that continuity in the spatial (=frequency) direction can not be guaranteed.
In the transmission line version a numerical implementation is constructed of the equations of motion governing the BM. This leads to a set of coupled oscillators (from now on the term section will be used).
If the mechanics of the cochlea are assumed linear, a transmission line model can be converted into a filterbank. This is done by using the impulse response of each of the sections of the transmission line as filter coefficients in a set of non-recursive filters. The filterbank is then implemented as a set of overlap-and-add convolution filters (In  this implementation is used).
In order to be able to introduce nonlinearity in later stages, and to keep modeling as close to the human system as possible, the transmission line model is used.
Details about the transmission line model can be found in e.g. , . For details about the filterbank implementation see .
Equations governing the transmission line model
The equation of motion of each of the sections of the transmission line model is:
In this equation y denotes the displacement of the section and p the local driving pressure. These values do, of course, depend on the position x, but this dependence is omitted from the formulae in order to improve readability. Dots are used to denote time-derivatives.
In equation C.1 m is the mass (per unit area) of the BM, d(x) is the damping (per unit area), which depends on the position x on the BM. The stiffness (per unit area) s(x) also depends on the position. This dependence is used to introduce the place-frequency relation mentioned before. A formula describing the place-frequency relation was given by Greenwood (1991) , which is used here. This place-frequency relation can be used to compute the stiffness as a function of position from .
The equation given by Greenwood  gives the characteristic frequency for every position (the frequency that reacts maximally). It is used in the model as a (free) resonance frequency, which is slightly higher, depending on other settings of the model. With the standard settings of the model this difference is small (less than 1%). The assumption that the Greenwood map gives the resonance frequency allows easy computation of the stiffness from . See  for more details about the difference between characteristic and resonance frequency.
The damping as function of position is computed from the stiffness by assuming a quality factor of resonance Q, using . In the original model the Q-factor was set to a constant value of 20 (based on experimental data). In later years van den Raadt and Duifhuis discovered that this leads to unwanted reflection of energy at the apex of the model . Duifhuis suggested a reduction of Q to 0.5 at the apex to remove all unwanted reflections . In this implementation Q is reduced (or its reverse, d, increased), not only to reduce reflections, but also to limit group delay (see section on group delay in the documentation on initialisation).
A second equation is needed to describe the coupling of the sections in longitudinal (=x) direction. This equation describes the mechanics of the incompressible (non-viscous) fluid inside the cochlea, surrounding the BM:
where a'^2 is a constant determined by the density of the cochlear fluid, the cross-sectional area of the cochlear channels and the width of the BM.
These equations can be combined to one single equation by eliminating p:
Although it is possible to solve this equation analytically under certain conditions, in general a numerical approach is taken to arrive at a solution. More details about the derivation of these equations and the possible solution methods can be found in .
The standard way to solve equation C.3 numerically involves the separation of the time dependence and the spatial dependence. This allows the use of a separate method for the spatial integration and the time integration. Therefore the equation is rewritten as proposed by Prof. dr ir H.W. Hoogstraten (RuG, Mathematics, unpublished) by defining j, g and v as:
with which we can rewrite equation C.3 to the set:
Equations C.5b and C.5c directly follow from C4.c and C.4b. C.5a can be obtained from equation C.3, by using C.4a and C.4b, and =/m.
The resulting set of equations contains a number of variables for which units of measure need to be chosen. Although it would make sense to choose SI units for all quantities, this does not always lead to efficient code. Therefore scaling units are introduced that fit the range of expected values for each of the variables. These scaling units have to be incorporated into the equations, a process known as dimensionalisation of the equations. As unit of time the millisecond was chosen. For the unit of distance the situation is somewhat complicated. The geometry of the cochlea can be scaled with its length, but motions of the BM are of a much smaller scale. Therefore the nm is chosen as a "dynamic" unit of distance, the cochlear length as "static" unit of distance.
The process of dimensionalisation is described in more detail in the work of Marc van den Raadt .
Equation C.5a only contains spatial derivatives and needs to be integrated in the spatial direction. Therefore it needs to be discretised. An equidistant numerical grid is used, leading to BM sections with equal size . The spatial derivative is discretised as . Equation C.5a thus becomes:
This forms a system of equations that can be
written as a matrix equation: Aj=r, where A is a tridiagonal matrix. The matrix A is
known (and constant over time). The righthand-side r contains the velocity and displacement that have to be computed using equations C.5b and C.5c. This equation can be solved, provided that A and r
are known, by a method named Gauss-elimination. This method can be found in any
book on numerical mathematics, e.g. , . (In the case of the matrix A no pivoting is required and
a unique solution is guaranteed, because the matrix is diagonally dominant, see 
pp 28). The solution procedure consists of eliminating the upper
subdiagonal, starting at the bottom, moving up. This elimination is
done by subtracting row n+1, multiplied by the proper fraction, from
the row at n. The same procedure could be repeated for the lower
subdiagonal, moving down, but a simpler method can be used. Moving
down, the elements on the lower subdiagonal all refer to known values,
so the solution can be computed directly.
This forms a system of equations that can be written as a matrix equation: Aj=r, where A is a tridiagonal matrix. The matrix A is known (and constant over time). The righthand-side r contains the velocity and displacement that have to be computed using equations C.5b and C.5c.
This equation can be solved, provided that A and r are known, by a method named Gauss-elimination. This method can be found in any book on numerical mathematics, e.g. , . (In the case of the matrix A no pivoting is required and a unique solution is guaranteed, because the matrix is diagonally dominant, see  pp 28).
The solution procedure consists of eliminating the upper subdiagonal, starting at the bottom, moving up. This elimination is done by subtracting row n+1, multiplied by the proper fraction, from the row at n. The same procedure could be repeated for the lower subdiagonal, moving down, but a simpler method can be used. Moving down, the elements on the lower subdiagonal all refer to known values, so the solution can be computed directly.
In 2000 Olof Schuring used a method developed at the Mathematics Department of the RuG for parallellization of a Gauss-elimination method, in order to distribute the cochlea code over various computer processors. Although this method can be used effectively for the cochlea model, the gain in computational speed will not be much higher than a factor 5, because of the overhead due to the parallelization scheme. Details can be found in .
Since the matrix A is invariant in time, the coefficients needed to elemininate the upper subdiagonal elements, as well as the new diagonal elements need to be computed only once. Therefore the computation of these coefficients can be performed in the initialisation (see documentation on initialisation).
Equation C.6 holds for sections in the interior of the cochlea. The situation at both ends, the helicotrema at the apex and the connection to the outside world through the middel ear at the base, is somewhat different.
The helicotrema is modelled as a hole in the BM, which means that the BM does not cover the entire length of the cochlea, but stops at a certain distance from the apex. This distance is taken to be equal to the height of a cochlear channel (which is half the diameter of the entire cochlea). The fluid flow through this hole is modelled as the fluid flow in the rest of the cochlea (mass only, due to incompressibility and non-viscous nature of the cochlear fluid), with the addition of a damping. This damping is introduced to represent resistance the fluid experiences due to the flow through a narrow hole.
The middle ear is modelled as an ideal transformer, scaling up pressures and scaling down velocities by a factor . This transformer is coupled to the outside world, which is represented by the acoustic impedance Z_a of air (a damping), through a mass and stiffness. The combination of stiffness, damping and mass of the middle ear models the transfer characteristic of the real middle ear in a crude fashion. This transfer is maximal around 2 kHz, with a very low Q-value (based on experimental data on middle ear transfer functions).
The exact shape of the equations at these boundaries is derived in . Here a so-called electric analogon is used to derive the equations of motion. No special attention will be paid to this technique here, but because cochlea models are often represented as electric schematics the schematics of the "Groningen" model are given in figure 2. The resulting equations for the two boundaries are:
In the case of the middle ear connection equation C.5c also changes, to:
see [vdR pp 29-32] for more details. Note that the middle ear is modelled with a separate section (section number 0). For the helicotrema such an extra section is not required, so sections 1 to N are BM sections.
In order to obtain values for velocity
and displacement at each time(step) a time integration technique is required.
The values of displacement and velocity are required for the righthand-side of equation C.5a (and C.6). But the value of phi is required for the solution of equation C.5c. Therefore the solution of the system of equations can only be found by simultaneous application of spatial and temporal integration.
The easiest and most computationally efficient method for time integration would be a direct (or explicit or Euler's) method (see  pp300). In such a direct method the first order derivative of y (between timesteps n and n+1) is approximated as (identically for v)(superscripts are used for time-indices, in order to see the difference with spatial indices). Equations C.5b and C.5c are written as:
However, computational efficiency is not the only requirement to be met by a time integration method. A very important consideration is numerical stability. Numerical stability is often defined as a region of the complex plane. Solutions of the original set of differential equations of the shape are stable (that is converge to 0 as t goes to infinity) if l has a real part that is negative. The difference equation (the numerical approximation of the differential equation) also has solutions of the form , which are stable if falls within a so-called stability region in the complex plane. Each numerical (time integration) method has its own stability region.
We know that in the cochlea sections move sinuso´dally, which means l has a considerable imaginary part for these solutions. Therefore the stability of solutions with a (sizeable) imaginary part is of great importance. Direct methods provide poor stability for solutions with a sizeable imaginary part and little effective damping (negative real part of l). Therefore a different (more complex) time integration method was originally chosen for the cochlea model: a four-step Runge Kutta method. This four-step method combines good stability with a high accuracy (small numerical error per time step). Its stability region is shown in figure 3, along with that of a direct method.
Runge-Kutta, however, has the disadvantage that it requires four evaluations per time step: one at the start to obtain a prediction of the situation half-way between the start and the finish of the time step, one to correct this prediction, one to obtain a prediction of the situation at the finish based on the corrected prediction half-way, and finally a correction on this prediction. (The Runge Kutta method is a prediction-correction scheme).
A full explanation of the Runge-Kutta method falls outside the scope of this document. For further details the reader is referred to a book on numerical mathematics, see e.g.  pp 366-380. Here only the resulting equations are given.
The four-step Runge Kutta has the general form:
for the first order differential equation:
For our system of coupled equations this changes to:
The conversion from equations C.10 to C.11 is not straightforward, but further details fall outside the scope of this document.
It can be seen from this formula
that stimulus values are needed at times . This is
why the input frequency of the cochlea model is double the internal sampling frequency , which equals the output sampling frequency.
(N.B. the term sampling frequency in relation to the cochlea model will refer to the internal sampling at , which is half the input sampling frequency.)
As output the values of the velocity are chosen. The rationale for this is that experimental evidence seems to indicate that the velocity of the BM is the driving force for the excitation of the sensory inner hair cells. (This does not remove the need for computation of the displacement, since this variable is required to compute ). In order to increase the spatial resolution in the model, the second order spatial derivative of the velocity is taken. Although there is no clear evidence of such a process taking place in the real cochlea, it could be linked to either (mechanical) bending of the BM, or some form of lateral inhibition or suppression. This lateral suppression could take place at the level of the BM mechanics, through the outer hair cells, or in the auditory nerve or higher brain centers. Although experimental data show no clear evidence of lateral suppression, there are strong indications that it may play a role in hearing.
The cochlea code consists of two parts:
In order to reduce computational load coefficients needed for Gauss elimination or Runge Kutta integration are computed before the actual (time) run starts, whenever possible. This process is quite complex, since it attempts to provide the user with easily understandable physical parameters and limit computation complexity within the main loop as far as possible. Therefore this part is described seperately (see document cochlea initialisation).
This is the actual main loop of the program, consisting of the four stages of the Runge Kutta method, with at each stage a Gauss elimination of the matrix equation.
Back to CPSP-theory
 Noteboom, S.G. and Cohen, A. (1976), "Spreken en verstaan"
 Moore, B.C.J. (ed) (1995), "Hearing"
 Andringa, T.C., PhD thesis (in preparation)
 Duifhuis, H., Hoogstraten, H.W., van Netten, S.M., Diependaal, R.J. and Bialek, W. (1985), "Modelling the cochlear partition with coupled Van der Pol oscillators" in: Peripheral Auditory Mechanisms, edited by J.B. Allen, J.L. Hall, A.E. Hubbard, S.T. Neely and A. Tubis (Springer, New York) pp. 290-297.
 van Hengel, P.W.J. (1996), "Emissions from cochlear modelling" PhD thesis
 Greenwood, D.D. (1991), "Critical bandwidth and consonance in relation to cochlear frequency-position coordinates" Hear. Res. 54, 164-208.
 van den Raadt, M.P.M.G. and Duifhuis, H. (1993), "Different boundary conditions in a one-dimensional time domain cochlea model" in Biophysics of Hair Cell Sensory Systems, edited by H. Duifhuis, J.W. Horst, P. van Dijk and S.M. van Netten (World Scientific, Singapore) pp. 103
 Duifhuis, H. (2001), "..."
 de Boer, E. (1980), "Auditory Physics. Physical principals in hering theory. I" Physics Reports 62, 87-174.
 van den Raadt, M.P.M.G. (1991), "Formulering 1-Dimensionaal COCHLEA MODEL in termen van een NETWERK (naar H. DUIFHUIS) [MACROMECHANISCH]" internal publication Department of Biophysics RuG
 Kreyszig, E. (1988), "Advanced Engineering Mathematics"
 Atkinson, K.E. (1978), "An Introduction to Numerical Analysis"
 Schuring, O.J. (2000), "Enhancement of a model of the human cochlea" internal publication Department of Biophyiscs RuG