Allographic fraglet codebooks for writer identification

Personal and individual writing style is not only characterized by slant and roundness but also by typical character shapes for writing a letter: Allographs. If a complete list of allographs were available, this would help in writer identification. However, before such a list is compiled from the writer population, one wants to be able to count the occurrence of typical shapes in a writing style. By applying an oversegmentation heuristic to scanned handwriting, fragmented connected components can be extracted from a sample. The shape of these fraglets can be characterized by their contour (Schomaker & Bulacu, 2004). First a codebook is computed of a few hundred prototypical fraglets, on the basis of clustering a very large reference data set. Then, for a given writer, the histogram of fraglet usage can be computed. The resulting vector is a very useful feature in writer identification and verification. If combined with other features, such as the hinge transform and/or horizontal run-length histograms, the overall system will provide a comprehensive view on writing aspects, with a consequently high writer-identification rate.

Relevant papers:

  • Schomaker, L.R.B. & Bulacu, M. (2004). Automatic writer identification using connected-component contours and edge-based features of upper-case Western script. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 26(6), June 2004, pp. 787 - 798.
  • Bulacu, M. & Schomaker, L.R.B. (2007). Text-independent Writer Identification and Verification Using Textural and Allographic Features, IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Special Issue - Biometrics: Progress and Directions, April, 29(4), p. 701-717.
  • Bulacu, M. & Schomaker, L.R.B. (2003). Writer Style from Oriented Edge Fragments. In: N. Petkov & M.A. Westenberg (Eds.), LNCS 2756 - Computer Analysis of Images and Patterns, pp. 460-469.
  • Schomaker, L.R.B. (1993). Using Stroke- or Character-based Self-organizing Maps in the Recognition of On-line, Connected Cursive Script. Pattern Recognition , 26(3), 443-450.
Lecture slides, ICDAR 2007, on history and theoretical backgrounds (.pdf conversion not perfect)

A codebook of fragmented connected-component contours
(30x30 Kohonen Self-organized Map)

Color-coded fraglets for different writers

The inset (left) refers to the 30x30 cells of the Kohonen map, whose color corresponds to the color of the nearest fraglet in the handwritten sample.
Note the different color 'signature' per writer.

Copyright 2007 Lambert Schomaker