Assignment 4: first recognizer
Note: the content of this page might change slightly; be sure to refresh (F5) right before you start.
Goal
The goal of this assignment is to make a first version of a recognizer.
--->
To save you some hassle, the line zones are provided, but your recognizer has to find the words.
Assignment
Make a handwriting recognizer. The commandline call must be as follows:
python recognize.py input.ppm input.words output.words
where:
- input.ppm is a complete page, such as NL_HaNa_H2_7823_xxxx.ppm;
- input.words contains all line zones, but no (usable) word zones (<Word> tags);
- output.words will be written and contain word zones (<Word> tags) including transcription.
Make sure that the C++ part of your code compiles with the command 'make'. This is required.
Submit your program by email to Jean-Paul.
Hints
- Copy the quickstart files:
cp -r /home/student/hwr/assignment4 ~/hwr/
Note: the files are described in the Appendix below.
- You are suggested to base the recognizer on your submission for assignment 2 (the word zone hypothesis generator).
Grading
- The grade depends on the recognition percentage. The recognition percentage is computed as follows:
For each "true" (manually labeled) word zone in a separate test set, your zones will be searched for a zone nearby (the coordinates are checked; a small error is tolerated). If a matching zone was found, the transcription is checked for correctness.
- For your information: the exact method to convert recognition percentages to grades will be determined after the results are known. This is unavoidable since it is not known what the results will be. But a recognizer that does something sensible and recognizes something will probably result in at least a 6. The best recognizer probably results in a very high grade.
Appendix: Description of files in assignment4/
File | Description |
wordstat.py | Makes a dictionary of all words in all .words files and shows frequency statistics. |
Last modified: 19 May 2010 by Jean-Paul van Oosten.