The Monk Line Segmentation (MLS) Dataset

Olarik Surinta, Lambert Schomaker, and Marco Wiering

Institution of Artificial Intelligence and Cognitive Engineering (ALICE)
Autonomous Perceptive Systems (APS), University of Groningen

Overview

The MLS dataset available from this page consists of 31 handwritten page scans. The dataset contains medieval, historical and contemporary manuscripts, and has the purpose of testing line-segmentation algorithms. The collection contains a wide variation of the common problems in handwriting recognition: lines with overlapping ascenders/descenders, slightly rotated scans and curved base lines.

Try our [ Monk search engine] here.

Captain's logs, 1777 Provincial archive, 1855 Moscow Archives, 1672 Cabinet of the King, KdK 1893 Early 15th century

Download

The MLS dataset was collected from the Monk system as of Friday May 17 14:15:04 CEST 2013. It was collected by Lambert Schomaker in May 2013 at the Institution of Artificial Intelligence and Cognitive Engineering (ALICE), University of Gronigen.

The tar.gz file contains the image dataset for historical manuscripts. For more details please refer to the README file in the tar.gz file. The dataset downloaded for research use only. © 2013 Copyright.

Download tar.gz file (58.9 MB) from ai.rug.nl

Citation

If you use this dataset please cite the following work:
@INPROCEEDINGS{Surinta:2014:ICFHR,
 	author = {O. Surinta and M. Holtkamp and M. F. Karaaba and JP. van Oosten and L. R. B. Schomaker and M. A. Wiering},
 	title = {A* Path Planning for Line Segmentation of Handwritten Documents},
	booktitle = {Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on},
 	year = {2014},
	month = {Sep},
 	pages = {175-180},
 	numpages = {6},
 	isbn = {978-1-4799-4335-7},
	issn = {2167-6445},
	publisher = {IEEE},
 	doi = {http://dx.doi.org/10.1109/ICFHR.2014.37},
}

Related Publication

  • O. Surinta, M. Holtkamp, M.F. Karaaba, JP. van Oosten, L.R.B. Schomaker and M.A. Wiering, "A* Path Planning for Line Segmentation of Handwritten Documents," in Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, 2014. pp. 175-180. link poster pdf

Results of A* Path Planning for line segmentation of Handwritten documents

lineSegmentResult/csg562-005 lineSegmentResult/csg562-006 lineSegmentResult/csg562-010 lineSegmentResult/csg562-012 lineSegmentResult/csg562-017 lineSegmentResult/csg562-018 lineSegmentResult/csg562-033 lineSegmentResult/csg562-036 lineSegmentResult/csg562-038 lineSegmentResult/csg562-048 lineSegmentResult/csg562-049 lineSegmentResult/monk_001 lineSegmentResult/monk_002 lineSegmentResult/monk_003 lineSegmentResult/monk_005 lineSegmentResult/monk_006 lineSegmentResult/monk_007 lineSegmentResult/monk_012 lineSegmentResult/monk_013 lineSegmentResult/monk_014 lineSegmentResult/monk_022 lineSegmentResult/monk_030 lineSegmentResult/monk_031

Other dataset

The ALICE Off-line Thai Handwritten Character (ALICE-THI) Dataset

Last updated on 02-July-2015, 08:03