The NICI stroke-based recognizer of on-line handwriting

Introduction and background

The NICI stroke-based recognizer of on-line handwriting was developed on the basis of knowledge on the handwriting production process. Initially, it started as a playful experiment to see how far we would get in handwriting recognition if we used pen-tip velocity as a basic piece of information in handwriting recognition. From the very beginning of on-line handwriting recognition attempts, the writer's movements were considered as a kind of nuisance or strange noise. Another argument which is often heard is that the movements are more writer-dependent than the shape of the ink trace. As a consequence, in many approaches it is assumed that the movement information should be removed before the actual pattern classification or feature extraction takes place. Usually, this is done by some form of spatial resampling or the calculation of Freeman codes for equal-length segments along the ink trace. We took the opposite approach - without too high expectations - and tried to exploit the knowledge which has been collected on the human handwriting process at NICI since 1976. Assuming equidistant sampling in time, we analyze the trajectory of the pen-tip, looking for regularities and lawfulnes in the handwriting process. A number of examples are given in a few live demos. The 'atomic' component of the handwriting signal is in our view the velocity-based stroke (VBS). We found out, that the approach is in fact quite fruitful. The majority of the writers produce ballistic movements without too many hesitations or other accidents. Heuristics can be applied to handle the statistical outliers. The approach is not suited for children's handwriting or handwriting with tremor.

________________________________________________________________________________

History

________________________________________________________________________________

Processing steps in the NICI stroke-based recognizer

The basic design philosophy was introduced in the following paper:

Schomaker, L.R.B., & Teulings, H.-L. (1990). A Handwriting Recognition System based on the Properties and Architectures of the Human Motor System. Proceedings of the International Workshop on Frontiers in Handwriting Recognition (IWFHR). (pp. 195-211). Montreal: CENPARMI Concordia.

Processing steps:

The system is organized as a pipeline, hierarchically going up from individual sample points, to strokes, to letters, and finally to words.

Performance

Performances of recognizers are difficult to measure, due to the large number of variables involved. It is like the fuel consumption by cars. The manufacturer will tell you a figure, but in practice, it is quite a different story. These are a number of factors influencing recognition rate and its reliability in recognizers of mixed-style handwriting:

Thus, recognition results should be interpreted with extreme caution. The rates are seldomly underestimated in literature. For what it is worth, the next figure gives a distribution of recognition rates in an 'unseen group of writers' for the stroke-based recognizer. It is the top-word recognition rate of the basic system: "How often was the system's best guess indeed correct". If the top word is not OK, the correct word may be the system's second or later guess, but this is ignored here. No word-shape information or linguistic statistics were used. Just searching for individual letters (all must be found). This means that all fused letters and spelling errors will lead to a missed word! Lexicon size was 250 words, each writer wrote 45 words. Results of recognizer version of '95:

Note that when this system meets unseen writers, a substantial part of them will have low recognition rates. For example, some of the writers will write small 'all-caps' letters, claiming that such is their lower case handwriting. The average processing time per word on a HP-UX 9000/735 workstation was 215ms.

This recognizer is only one of several methods we have tried over the last few years. It is our oldest method and still performs best as regards speed and recognition performance, although much can (and will) be improved. Initially started as a pure connected-cursive recognizer, the approach gradually allowed for incorporating mixed handwriting and isolated handprint, as well. Other approaches developed at NICI are a character-based variant of this recognizer, and a number of other post-processing methods than graph-based LR search are being explored. Currently, the system is being retrained with UNIPEN data and a richer stroke feature vector.

________________________________________________________________________________

Working demo

The recognizer described here has been improved considerably and has been developed into a larger system during 1999-2000, for live demos in a Dutch museum (Scryption) and on the 7th IWFHR conference. This system combines the stroke-based approach as described above with several independent character classifiers in a multiple-agent setup. It is dubbed dScript.

dScript Demo description

________________________________________________________________________________

Other interesting material:

o Handwriting Recognition and Document Analysis Conferences

o Pen & Mobile Computing

o NICI Handwriting Recognition Group home page

o UNIPEN tools

_____________________________________________________________________________________

schomakerOai.rug.nl