A knowledge-based recognition system for historical Mongolian documents
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
DOI:
10.1007/s10032-016-0267-1
Publication Date:
2016-04-25T05:39:22Z
AUTHORS (4)
ABSTRACT
This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (31)
CITATIONS (14)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....