Handwriting recognition course 2014

Period 2b (block 4), 2014 Progress code: KIM.SCHR03

Page from 'Nationaal Archief'

Beware: this page is not yet up to date!

Goal

In this course you learn how an automatic handwriting recognizer works. You will make a recognizer yourself and write a scientific report on it. The focus is on using a character(ish) approach, which can be bootstrapped from labels at the word-level.

Procedure

The handwriting material for this course is historical handwriting from the “Queen’s Cabinet” (Kabinet der Koningin, stored at the Dutch National Archive, Nationaal Archief, Den Haag) as shown in the figure to the right.

You are expected to form groups of 3 or 4 persons. Each group will work towards a handwriting recognition system that uses “smaller-than-words” chunks, such as characters. One of the first tasks of all teams is to annotate or mine character-labels from annotations at the word-level.

Halfway through the course (see the schedule below for the details), each team is expected to hand in a draft of the literature review and Method section of the report.

Programming is done in either Python, C, C++ or Java, or a combination. For instance, Python can be used for quickly creating the general framework; C++ for the low-level procedures. Details on how this works will be provided during the first practical session.

In order to facilitate the cooperation within a group, it is advised to use version control software. Well-known packages are git and subversion. If you would like to have a subversion repository, please ask Jean-Paul to set one up for you (it is a good idea to arrange for that before the first practical session, so you can start ‘committing’ changes the first session already).

At the end of the course, you submit the final version of your recognizer and a written, scientific report.

Lectures

On Thursday Prof. dr. Schomaker will give a lecture, after which each group will present a progress update (more on that below).

On Wednesday practical sessions supervised by Jean-Paul van Oosten are scheduled; you can use these to work on your recognizer, collaborate with your group, ask questions, etc.

The final lecture, each group will present their entire classifier, the approach and empirical evaluation, including results on a separate, secret test-set (Jean-Paul will perform the final tests of your classifier).

Progress updates

Each lecture, starting from the second lecture, all groups are expected to give a progress report. The report should at least have the following components:

  1. Overview of articles you have read (including a total of articles read so far);
  2. Number of character-labels labeled or mined;
  3. Amount of text written for literature review & Method section of the paper;
  4. Programming progress (how many modules will the system have, how far is each module completed?);
  5. Empirical evaluation progress (both technical and theoretical);
  6. Is the overall progress on schedule?

Each component needs to be properly documented and supported by either references or tables and graphs. Show that you read the articles (i.e., show what the article was about and the conclusions), and keep a list, you need it for the References section of your paper.

Each group member should have had the opportunity to show their presentational skills: divide all tasks, including the presentations between the group members equally. Appoint a person for maintaining the progress update PPT slides; a person overlooking overall system architecture, a person designing the empirical evaluation (test scripts), etc.

Your recognizer

At the end of the course, you are expected to have written a handwriting recognition system. To test your recognizer, Jean-Paul will (compile and) run your code on a separate, secret test set. See the technical details on how to hand in your program, how to handle arguments, etc.

Grading

You will be graded on your participation in the group, on your presentation, empirical evaluation and programming, as well as on your report. The report is written individually, but parts of the report can be written as a draft by the group as a whole (note that this means that your final, personal report needs to be substantially different from the other reports of your group, especially in the Introduction and Discussion sections).

The final grade appears on Progress. There is no exam other than the final recognizer and written report.

Report

The report needs to be a scientific paper about the handwriting recognition system that you built during the course.

Your report will be graded on the following subjects: title + abstract, introduction, method, formalisms, analysis, results + interpretation, references, engineering, science.

Important pages

Deadlines

May 14
Character annotations
May 30
Deadline report drafts (literature review and method section)
June 4
First version of your recognizer (Hiscores)
June 30, 17:00
Final version of your recognizer (Hiscores)
July 16
Report

Slides of previous years

Literature

Links


Direct your questions to Jean-Paul van Oosten.


Last modified: July 01, 2014, by Jean-Paul van Oosten
Part of the HWR course