Recognizer: technical details

The goal of the course is to build a recognizer. You will create a program called recognize, that will classify all word zones on a page. The input will be an input image (a complete page, in ppm format) and an XML file containing the coordinates of each word-zone.

You will be provided with a number of python and C++ modules that will do some work for you, such as reading and writing ppm/pgm images. These files can be found in /home/student/vakken/hwr/toolbox. Details of the contents of the toolbox can be found below.

Combining C++ and Python

To be able to call C++-functions from, e.g., Python code, you can use Swig. This program creates the code that is needed to link both programming languages. You only need to specify a small “interface file”. In the toolbox you can find a simple example for the image library.

The swig website contains a small tutorial that has a little more information. For the toolbox, you can just call make to create everything you need.

Word coordinate specification

The toolbox contains a small Python module wordio.py that parses .words-files. These files are XML files that accompany image files and describe where the text lines and words are, and what the transcription of each word is.

An example, describing two text lines:

<?xml version="1.0" encoding = "UTF-8"?>
<Image name="NL_HaNa_H2_7823_0055">
    <TextLine no="1" top="599" bottom="803" left="1235" right="2843" shear="45">
        <Word no="1" top="599" bottom="803" left="1235" right="1536" shear="45" text="Rappt"/>
        <Word no="2" top="623" bottom="803" left="1513" right="1610" shear="45" text="JD"/>
        <Word no="3" top="708" bottom="803" left="1526" right="1624" shear="45" text="10"/>
        <Word no="4" top="649" bottom="782" left="1684" right="1846" shear="45" text="Feb"/>
        <Word no="5" top="688" bottom="803" left="1808" right="1884" shear="45" text="no"/>
        <Word no="6" top="708" bottom="803" left="1865" right="2024" shear="45" text="175,"/>
        <Word no="7" top="708" bottom="803" left="2025" right="2212" shear="45" text="om"/>
        <Word no="8" top="708" bottom="803" left="2213" right="2843" shear="45" text="machtiging"/>
    </TextLine>
    <TextLine no="2" top="781" bottom="929" left="1106" right="2890" shear="45">
        <Word no="1" top="786" bottom="903" left="1106" right="1408" shear="45" text="wijzend"/>
        <Word no="2" top="781" bottom="871" left="1425" right="1518" shear="45" text="te"/>
        <Word no="3" top="792" bottom="868" left="1554" right="1949" shear="45" text="beschikken"/>
        <Word no="4" top="812" bottom="929" left="1930" right="2032" shear="45" text="op"/>
        <Word no="5" top="810" bottom="884" left="2036" right="2166" shear="45" text="een"/>
        <Word no="6" top="808" bottom="903" left="2185" right="2724" shear="45" text="verzoekschrift"/>
        <Word no="7" top="808" bottom="897" left="2750" right="2890" shear="45" text="van"/>
    </TextLine>
</Image>

The coordinates of the <TextLine> tags indicate text line zones that were determined semi-automatically. The shear factor indicates the shear angle, or slant, of the text. It is always 45 degrees for this dataset.

Because of this shear, the word zones are interpreted as parallelograms. The attributes of the <Word> tags have the following meaning:

Output of the recognizer

The recognizer program will receive two arguments: the input image and a .words file, without the text-attribute. This means that you don’t have to find the coordinates and shear of the words, but can focus on the classification.

The output of your program should be a new .words file. It should be identical to the input file, only with the text-attribute filled in. Your program will be invoked as follows:

$ recognizer input.ppm input.words /path/to/output.words

where recognizer is your program and $ represents the command line prompt. The file at /path/to/output.words does not exist yet, and your program will create it and write the filled-in xml file.

Handing in your recognizer

Hand in your code in a .tar.gz file, marked with your name and version. If there are compilation steps (if you use C++ or C in your program), provide a Makefile that compiles your program with a single make command.

Document your code thoroughly, this will make grading easier and help you writing the final paper.

Getting started

Step 1: Get the toolbox files

Step 2: Get a sample image

Step 3: Compile and test the toolbox files

Get the .words files

The .words files can be found in /home/student/vakken/hwr/data/words.

Toolbox contents

File Description
pamImage.cpp Read and write .pbm / .pgm / .ppm files (you don’t need to look inside this file)
pamImage.h Header for pamImage.cpp; look here to see what you can do with a PamImage object.
pamImage.i Interface between C++ and Python for pamImage. Swig uses this file to create a Python wrapper around the C++ code
cocos_arnold/ C++ routines for fast connected components labeling by Arnold Meijster (no need to look inside).
cocoslib.cpp Procedures to compute connected components in document images (uses cocos_arnold/).
cocoslib.h Header for cocoslib.cpp.
cocoslib.i Interface between C++ and Python for cocoslib
example_cocos.py Provides a quickstart for using connected components.
croplib.cpp Crop .pbm / .pgm / .ppm images
croplib.h Header for croplib.cpp
croplib.i Interface between C++ and Python for croplib
example-crop.py Shows how to use croplib. Crops an image.
word.py Class for word zones and transcription
wordio.py Reads and writes .words files
Makefile Instructions for make to compile the code.

Last modified: April 22, 2014, by Jean-Paul van Oosten
Part of the HWR course