
"Working at the frontiers of knowledge", RUG
Multi-Script Handwritten Character Recognition
Using Feature Descriptors and Machine Learning
Dissertation by Olarik Surinta (โอฬาริก สุรินต๊ะ), University of Groningen, September 2016.
ISBN: 978-90-367-9149-5 (printed) / 978-90-367-9149-6 (electronic).
Defended on September 23, 2016, in Groningen, the Netherlands.
Promotor: Prof.dr. L.R.B. (Lambert) Schomaker
Supervisor: Dr. M.A. (Marco) Wiering

cover designed by Pluis
Abstract |
In this PhD research, several methods are proposed to deal with several challenges that occur when trying to recognize handwritten characters from multiple language scripts. The thesis contributes to all levels of processing isolated character images: from intensity normalization to segmentation, and from feature extraction to the final classification. Moreover, solutions are proposed for recognizing isolated handwritten character images when not very many handwritten character examples are available.
The main goal of the research presented in this dissertation is to study robust feature extraction techniques and machine learning techniques for handwritten character recognition. The best techniques are the combination of the histogram of oriented gradients with bags of visual words. Furthermore, a new method for line segmentation is proposed, which is a part of document layout analysis. The novel techniques have been tested on many different scripts and the results show that they effectively address the problems of line segmentation and character recognition.
Multi-Script Handgeschreven Karakter Herkenning
met behulp van Kenmerk Descriptoren en Machinaal Leren
Abstract (Dutch) |
In dit promotieonderzoek worden methodes voorgesteld om problemen die zich voordoen bij het herkennen van handgeschreven lettertekens uit meerdere schriften op te lossen. Het proefschrift draagt bij aan alle verwerkingsniveau's van afbeeldingen van individuele lettertekens: van intensiteitsnormalizatie tot segmentatie, en van feature-extractie tot de uiteindelijke classificatie. Bovendien worden oplossingen aangedragen voor het herkennen van individuele lettertekens wanneer er weinig handgeschreven voorbeelden beschikbaar zijn voor elk letterteken.
Het hoofddoel van het onderzoek in dit proefschrift is om robuuste technieken te ontwikkelen op het gebied van feature-extractie en machinaal leren voor de herkenning van handgeschreven lettertekens. De beste technieken zijn combinaties van een histogram van geörienteerde gradiënten en bags-of-visual-words. Daarnaast wordt een nieuwe methode voor regelscheiding gepresenteerd als onderdeel van de analyze van de layout van het document. De nieuwe technieken zijn getest op vele verschillende schriften, en de resultaten laten zien dat ze effectief zijn in de aanpak van de problemen omtrent regelscheiding en lettertekenherkenning.
Propositions
- The goal in multi-script handwritten character recognition is to achieve a high recognition performance on isolated handwritten characters from different scripts. – Chapter 1, this PhD thesis –
- If a number of appropriate cost functions have been designed, the original A* path-planning
algorithm can move through overlapping or connected text areas instead of moving around. – Chapter 2, this PhD thesis –
- Outputs of different classifiers can be combined and classified by the unweighted majority vote
method, which results in high accuracies on isolated handwritten character datasets. – Chapter 3, this PhD thesis –
- In our isolated handwritten character datasets, the best feature descriptors achieve high
recognition performances on challenging handwritten datasets with a simple classifier. – Chapter 4, this PhD thesis –
- Some feature extraction methods are able to capture the necessary information from the character
images, which makes them important for a recognition algorithm. – Chapter 4, this PhD thesis –
- The combination of local feature descriptors and the bags of visual words approach gives the
highest recognition performances. – Chapter 5, this PhD thesis –
External links
Citation
BibTeX @phdthesis{726cd3bc1c654bf4a38afac572abfe0e, title = "Multi-script handwritten character recognition: Using feature descriptors and machine learning", author = "Olarik Surinta", year = "2016", isbn = "9789063791465", publisher = "University of Groningen", }
@September, 2016