Correcting Broken Characters in the Recognition of Historical Printed Documents
last modified
2007-03-13
Droettboom, M., 2003
In: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries. Edited by IEEE..
Abstract:
This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
URL:
http://dkc.jhu.edu/gamera/papers/droettboom_broken_characters.pdf
Abstract:
This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
URL:
http://dkc.jhu.edu/gamera/papers/droettboom_broken_characters.pdf