NWO/EW project Morph

Learning to learn: An Adaptive Reading System using a High-Performance Morphed-Image Correlator

Project duration: March 2006 - March 2010

Current methods for handwriting recognition are unsuitable for use in massive collections of historical documents. All statistical techniques require large amounts of labeled word images with their 'ASCII' ground truth. The manual labeling of text ground truth of image sections needs to be replicated for each document type and historical period due to the extraordinary variation in writing styles. Since optical character recognition of unconstrained-style handwritten documents is not possible, the digitization process of large and important document collections is in a state of deadlock.

Current computing power, notably the availability of the Blue Gene supercomputer, allows for a new way of using machine-learning technology and non-statistical brute-force matching methods. Using high-performance computing, it will be possible to learn to identify similarities in text passages. Using a bootstrapping approach with limited-effort human intervention, relevant keywords and phrases in the text can be learned. Subsequently, adapted information-retrieval (IR) techniques can be used to search in a large handwritten document collection. Single-processor experiments yield promising results but the experimentation process takes too much time on a single-processor. The manual construction of optimal processing recipes for a given problem is cumbersome. High-performance computing will help out under the condition that principled approaches for optimizing the processing pipeline exist.

Team leader
prof. Lambert Schomaker
Kunstmatige Intelligentie
Rijksuniversiteit Groningen
Grote Kruisstraat 2/1
9712 TS Groningen
Tel: 050-3637908

AI / RuG

... input to ...

Blue Gene @ Groningen University