The ALICE Off-line Thai Handwritten Character (ALICE-THI) Dataset

Olarik Surinta, Lambert Schomaker, and Marco Wiering

Institution of Artificial Intelligence and Cognitive Engineering (ALICE)
Autonomous Perceptive Systems (APS), University of Groningen

Overview

The number of Thai consonants is not uniquely defined, because some characters are outdated. In this dataset, the Thai handwritten dataset is collected according to the standard Thai script consisting of 78 characters. We collected a new Thai handwritten script dataset from 150 native writers who studied in the university and are aged from 20 to 23 years old. They used a 0.7 mm ink pen writing Thai scripts consisting of consonants, vowels, tones and symbols on a prepared A4 form. The participants were allowed to write only the isolated Thai script on the form and at least 100 samples per character. The character images obtained from this dataset generally have no background noise. Moreover, the forms were scanned at a resolution of 200 dots per inch.

Publication

  • O. Surinta, M.F. Karaaba, L.R.B. Schomaker and M.A. Wiering, "Recognition of handwritten characters using local gradient feature descriptors," in Engineering Applications of Artificial Intelligence, (45)2015, pp. 405-414. link pdf

Citation

if you use this dataset please cite the follwing work:
		
@article{SURINTA2015405,
	title = "Recognition of handwritten characters using local gradient feature descriptors",
	journal = "Engineering Applications of Artificial Intelligence",
	volume = "45",
	number = "Supplement C",
	pages = "405 - 414",
	year = "2015",
	issn = "0952-1976",
	doi = "https://doi.org/10.1016/j.engappai.2015.07.017",
	url = "http://www.sciencedirect.com/science/article/pii/S0952197615001724",
	author = "Olarik Surinta and Mahir F. Karaaba and Lambert R.B. Schomaker and Marco A. Wiering",
	keywords = "Handwritten character recognition, Feature extraction, Local gradient feature descriptor, 
	Support vector machine, k-nearest neighbors"
}	
	

Download

The tar.gz file contains the image dataset for Thai handwritten script. For more details please refer to the README file in the tar.gz file. The dataset downloaded for research use only. © 2015 Copyright.

Download tar.gz file (7.3 MB) from ai.rug.nl

Other dataset

The Monk Line Segmentation (MLS) Dataset

Last updated on 30-November-2017, 10:13