The goal of the course is to build a recognizer. You will create a program
called recognize
, that will classify all word zones on a page. The input
will be an input image (a complete page, in ppm format) and an XML file
containing the coordinates of each word-zone.
You will be provided with a number of python and C++ modules that will do some
work for you, such as reading and writing ppm/pgm images. These files can be
found in /home/student/vakken/hwr/toolbox
. Details of the contents of the
toolbox can be found below.
To be able to call C++-functions from, e.g., Python code, you can use Swig. This program creates the code that is needed to link both programming languages. You only need to specify a small “interface file”. In the toolbox you can find a simple example for the image library.
The swig website contains a small
tutorial that has a little more
information. For the toolbox, you can just call make
to create everything
you need.
The toolbox contains a small Python module wordio.py
that parses
.words
-files. These files are XML files that accompany image files and
describe where the text lines and words are, and what the transcription of
each word is.
An example, describing two text lines:
<?xml version="1.0" encoding = "UTF-8"?>
<Image name="NL_HaNa_H2_7823_0055">
<TextLine no="1" top="599" bottom="803" left="1235" right="2843" shear="45">
<Word no="1" top="599" bottom="803" left="1235" right="1536" shear="45" text="Rappt"/>
<Word no="2" top="623" bottom="803" left="1513" right="1610" shear="45" text="JD"/>
<Word no="3" top="708" bottom="803" left="1526" right="1624" shear="45" text="10"/>
<Word no="4" top="649" bottom="782" left="1684" right="1846" shear="45" text="Feb"/>
<Word no="5" top="688" bottom="803" left="1808" right="1884" shear="45" text="no"/>
<Word no="6" top="708" bottom="803" left="1865" right="2024" shear="45" text="175,"/>
<Word no="7" top="708" bottom="803" left="2025" right="2212" shear="45" text="om"/>
<Word no="8" top="708" bottom="803" left="2213" right="2843" shear="45" text="machtiging"/>
</TextLine>
<TextLine no="2" top="781" bottom="929" left="1106" right="2890" shear="45">
<Word no="1" top="786" bottom="903" left="1106" right="1408" shear="45" text="wijzend"/>
<Word no="2" top="781" bottom="871" left="1425" right="1518" shear="45" text="te"/>
<Word no="3" top="792" bottom="868" left="1554" right="1949" shear="45" text="beschikken"/>
<Word no="4" top="812" bottom="929" left="1930" right="2032" shear="45" text="op"/>
<Word no="5" top="810" bottom="884" left="2036" right="2166" shear="45" text="een"/>
<Word no="6" top="808" bottom="903" left="2185" right="2724" shear="45" text="verzoekschrift"/>
<Word no="7" top="808" bottom="897" left="2750" right="2890" shear="45" text="van"/>
</TextLine>
</Image>
The coordinates of the <TextLine>
tags indicate text line zones that were
determined semi-automatically. The shear factor indicates the shear angle, or
slant, of the text. It is always 45 degrees for this dataset.
Because of this shear, the word zones are interpreted as parallelograms. The
attributes of the <Word>
tags have the following meaning:
top
indicates the Y position of the top of the parallelogrambottom
indicates the Y position of the bottom of the parallelogramleft
indicates the X position of the top-left vertex of the parallelogramright
indicates the X position of the top-right vertex of the
parallelogramtext
is the transcription of the handwritten text in the imageThe recognizer program will receive two arguments: the input image and a
.words
file, without the text-attribute. This means that you don’t have to
find the coordinates and shear of the words, but can focus on the
classification.
The output of your program should be a new .words
file. It should be
identical to the input file, only with the text-attribute filled in. Your
program will be invoked as follows:
$ recognizer input.ppm input.words /path/to/output.words
where recognizer
is your program and $
represents the command line prompt.
The file at /path/to/output.words
does not exist yet, and your program will
create it and write the filled-in xml file.
Hand in your code in a .tar.gz
file, marked with your name and version. If
there are compilation steps (if you use C++ or C in your program), provide a
Makefile
that compiles your program with a single make
command.
Document your code thoroughly, this will make grading easier and help you writing the final paper.
mkdir ~/hwr
cp -r /home/student/vakken/hwr/toolbox ~/hwr
.tif
file named like NL_HaNa_H2_7823_xxxx.tif
,
not a file with ‘colorized’ in the name or a file with extension .kdkxml
./dev/shm
(it is flushed
at a reboot). These files are big; keep an eye on disk space usage. Type
df -h
to see how much space is available..pbm/.ppm/.pgm
files (simple
bitmaps; .pbm
is black/white, .ppm
is color, .pgm
is grey scale).convert -compress LZW
/dev/shm/NL_HaNa_H2_7823_xxxx.tif /dev/shm/NL_HaNa_H2_7823_xxxx.ppm
.ppm
). You can remove the .tif
version of the image.cd ~/hwr/toolbox
make
python example-crop.py /dev/shm/NL_HaNa_H2_7823_xxxx.ppm /dev/shm/cropped.ppm
example_cocos.py
). It will not
work immediately, because you don’t have a gray-scale image (.pgm
-file)
yet, but it is provided to learn how to work with the connected
component-part of the toolbox..words
filesThe .words
files can be found in /home/student/vakken/hwr/data/words
.
File | Description |
---|---|
pamImage.cpp |
Read and write .pbm / .pgm / .ppm files (you don’t need to look inside this file) |
pamImage.h |
Header for pamImage.cpp ; look here to see what you can do with a PamImage object. |
pamImage.i |
Interface between C++ and Python for pamImage . Swig uses this file to create a Python wrapper around the C++ code |
cocos_arnold/ |
C++ routines for fast connected components labeling by Arnold Meijster (no need to look inside). |
cocoslib.cpp |
Procedures to compute connected components in document images (uses cocos_arnold/ ). |
cocoslib.h |
Header for cocoslib.cpp . |
cocoslib.i |
Interface between C++ and Python for cocoslib |
example_cocos.py |
Provides a quickstart for using connected components. |
croplib.cpp |
Crop .pbm / .pgm / .ppm images |
croplib.h |
Header for croplib.cpp |
croplib.i |
Interface between C++ and Python for croplib |
example-crop.py |
Shows how to use croplib . Crops an image. |
word.py |
Class for word zones and transcription |
wordio.py |
Reads and writes .words files |
Makefile |
Instructions for make to compile the code. |
Last modified: April 25, 2013, by Jean-Paul van Oosten
Part of the HWR course