Personal and individual writing style is not only characterized by slant and roundness but also by typical character shapes for writing a letter: Allographs. If a complete list of allographs were available, this would help in writer identification. However, before such a list is compiled from the writer population, one wants to be able to count the occurrence of typical shapes in a writing style. By applying an oversegmentation heuristic to scanned handwriting, fragmented connected components can be extracted from a sample. The shape of these fraglets can be characterized by their contour (Schomaker & Bulacu, 2004). First a codebook is computed of a few hundred prototypical fraglets, on the basis of clustering a very large reference data set. Then, for a given writer, the histogram of fraglet usage can be computed. The resulting vector is a very useful feature in writer identification and verification. If combined with other features, such as the hinge transform and/or horizontal run-length histograms, the overall system will provide a comprehensive view on writing aspects, with a consequently high writer-identification rate. Relevant papers:
|
A codebook of fragmented connected-component contours
|