Soundscape annotation tool

A tool to efficiently annotate soundscape recordings, currently in development state available on this webpage.

More screenshots ...


The following packages are required for the Annotation Tool to run: Depending on your distribution you may also need to install Tkinter, this allows you to run Python applications that use Tcl/Tk.


Source, written for Linux: Run to install the Annotation Tool in your Linux distribution.

Compiled 32bit Windows .exe, packaged as .zip:
There currently is no installation procedure available; just extract the contents of the zip-file and run start_gui.exe to run the tool.


This software tool was designed to work with audio recordings formatted in the dataset format as it is used within our group. The tool allows researchers to read our dataset of real-world environmental sounds, DARES, which can be found on A Matlab implementation of the ACGDataset class is also available there.

Soundscape annotations

The term soundscape stems from ecological acoustics (also called acoustic ecology or ecoaccoustics), a field initiated by R. Murray Shafer in the late sixties. A soundscape is to the ear what a landscape is to the eye: just as the eye explores a landscape by inspecting small regions of it in a stream of saccades, it is believed (ref!) that the attentional system scans the available auditory information piece by piece while constituting a mental representation of the acoustical environment.

In the natural world this acoustical environment is always dynamic; acoustical events can be distinguished in the stream of acoustical information. Once these events are identified, they can be located in time, for example as two timepoints that constitute a period for which the event was audible. These timepoints together with a description can be regarded as annotations to a soundscape recording. It is this kind of annotations that we collect and analyze for real-world soundscape recordings.

In our research we focus on recordings made in real-world environments, but basically any sound recording can be loaded into the tool.

(more on collecting soundscape annotations)

Dataset structure: XML

Recording metadata and annotations are stored in .XML files. Each .xml file may contain multiple recordings, each stored in a Dataset. One list of used classes can be added to an xml file. The .XML format supports different metadata fields that are used in our research.
Below an example of a dataset is presented.
(to be added)

Visual audio representation: cochleogram

The tool uses a visual representation of the audio, derived from a model of the basilar membrane, a structure in the cochlea that transfers the physical motion to neural spikes. This technique is explained in this document.
The Python Annotation Tool is not capable of calculating this representation; instead, it is stored as a PNG image that the .xml document links to. In principle this image can contain any visual representation of the sound, the cochleogram was chosen here because it displays the distribution of energy in the frequency domain, allowing visual identification of a sound source to some extend.

Python classes to read and edit XML Dataset files

Several classes are available to read, write and edit XML Datasets.
An example: Below the module AnnotationsReader is demonstrated; this class interfaces with DOM_xml_handler() that maintains the DOM tree containing annotations and metadata.
>>> from data import *
>>> ar = AnnotationsReader()
>>> ar.open_file('/data/scratch/robert/sound_svn/projects/pyAnnotationTool/dataset_examples/dares_onechannel_selection.xml')
selected /data/scratch/robert/sound_svn/projects/pyAnnotationTool/dataset_examples/dares_onechannel_selection.xml
>>> ar.get_datasets
<bound method AnnotationsReader.get_datasets of <data.data_annotationsreader.AnnotationsReader instance at 0xb74de6cc>>
>>> ar.get_datasets()
[<data.data_dataset.Dataset object at 0x951546c>, <data.data_dataset.Dataset object at 0x951978c>, <data.data_dataset.Dataset object at 0x951984c>]
>>> ar.get_datasets()[0]
<data.data_dataset.Dataset object at 0x951546c>
>>> ar.get_datasets()[0].eventslist
[<annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'recorder rattle'>, <annotation class 'bicycle on gravel'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'footsteps'>, <annotation class 'speech'>, <annotation class 'speech'>, <annotation class 'fumbling with microphone'>, <annotation class 'fumbling with microphone'>, <annotation class 'speech'>, <annotation class 'bicycle on gravel'>, <annotation class 'bicycle stand'>, <annotation class 'bicycle on gravel'>, <annotation class 'bicycle on gravel'>, <annotation class 'bicycle gears'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'crickets'>, <annotation class 'wind through dune grass'>, <annotation class 'saddle bags'>]
>>> ar.get_datasets()[0].eventslist[0]
<annotation class 'speech'>
>>> ar.get_datasets()[0].eventslist[0].startTime


1. Screenshot of the annotation tool displaying a dataset of the DARES collection.

2. Screenshot of the dialog where the user selects a class label for the chosen region.

3. Screenshot with labels for the most important parts of the graphical interface.

Creating datasets
The graphical user interface currently has no means to create a dataset from scratch - this has to be done by hand, by editing an existing dataset file. Future versions will contain methods to create new datasets.
Cochleogram images need to be created using separate software. Our group uses Matlab software to calculate these images.

Creating annotations
After a dataset has been opened and the cochleogram image is visible, annotations can be created by holding [shift] and drawing a region on the cochleogram window. After the left mouse button is released, the dialog box in 3. pops up, requesting the user to select a class. When this dialog is completed the colored bar indicating the annotated region is presented in the lower field of the interface.

XML Tree operations
Below the surface three different representations of recording and annotation data exist: Command line options
The following command line options are avaible:


  1. J.D. Krijnders, M. van Grootel, T.C. Andringa, Research database for everyday listening, Accepted for oral presentation at NAG/DAGA 2009, Rotterdam, in Proceedings of NAG/DAGA 2009, pp. 996-999, #346
  2. J.D. Krijnders, T.C. Andringa, Soundscape annotation and environmental source recognition experiments in Assen (NL), Accepted at Inter-noise 2009, Ottawa, Canada

Software Copyright

This software is copyrighted. You may freely download and use it, but you are not allowed to use it for commercial purposes without informing me.

© 2012 Robert van der Linden, Sensory Cognition Group, Rijksuniversiteit Groningen