Gamera FAQ
Some frequently asked questions and (hopefully) some answers.
General
- What is Gamera?
Gamera is a toolkit for building document image recognition systems. It consists of a programming library and a set of GUI tools for experimentation and training. Gamera hopes to reduce the development time of document recognition applications by including a number of commonly uses components to prevent "reinvention of wheels" whenever possible. Please see the Gamera overview for more information.
The term "document" is used loosely, and can include many kinds of information presented in two-dimensional form. Gamera has been used to build recognizers for common music notation, medieval manuscript and other things.
- What is Gamera not?
Gamera is not a packaged document recognition system, such as OmniPage or MIDISCAN. It is a tool with which one can develop document recognition applications, but is not one itself. Developing a recognizer for Gamera is designed to be as easy as possible, but still requires a considerable time committment.
Gamera's focus is somewhat biased towards document types that are not well supported by existing, off-the-shelf software. Certain document types, such as medieval manuscripts, are unlikely to provide the financial incentive to support the development of a commercial application.
- Why the name "Gamera"?
Gamera is the acronym for "Generalized Algorithms and Methods for Enhancement and Restoration of Archives". The software, which grew out of our research on a system called AOMR (Adaptive Optical Music Recognition), was christened as Gamera on 1 April 2001.
Gamera is also the name of a overgrown turtle in a series of Japanese monster movies. There is some hope that the software, like the turtle in the Turtle and the Hare story, will eventually be triumphant.
- What sorts of scripts will Gamera work with?
This is "script" in the sense of "writing system", not "scripting language".
- Scripts with small character sets and (mostly) well-segmented characters (e.g. Latin, Greek, Hebrew, Cyrllic), Gamera performs very well.
- For cursive machine-printed scripts (e.g. Arabic) we have an active research project to implement segmentation-free recognition.
- For large character sets (e.g. Kanji) some sort of syntactical or structural analysis of the character is necessary. This sort of thing is not implemented in Gamera at present, but there is nothing stopping an interested researcher from adding these features.
- I'm sure there's other categories of which I'm completely ignorant.
And don't forget Gamera has been used to develop systems for other non-text structured documents such as commmon music notation and lute tablature.
- Why can't I put my image in and get text out?
See the question "What is Gamera not?". There is a rudimentary framework for text extraction in roman_text.py, however, expect that there will be a lot of customization necessary for each document domain.
- How should I get started?
It is helpful to have a background in programming. A basic knowledge of Python is required, but most people who have experience in another mainstream language generally find Python easy to learn. The recommended reading for starters is:
- Gentle overview of Gamera -- to get an idea of what Gamera is and what it isn't.
- Writing simple Gamera scripts -- to start creating very basic Gamera scripts.
- The source code of roman_text.py is a slightly more complex example for basic text recognition.
- How can I get help?
The gamera-devel mailing list on Yahoo! Groups is the best way to contact the authors and other members of the community. If you are running into a bug, please be sure to include the following information:
- The versions of Gamera, Python and wxPython you are using
- Your platform
- Any output or backtraces that are being produced
- How should I cite Gamera (in an academic paper etc.)?
The canonical URL for the Gamera website is http://gamera.sourceforge.net/ That URL will always contain the most up-to-date information on Gamera with links to the offical documentation and published papers.
If you are required to cite a published paper rather than a website, the most extensive and current information is in:
Droettboom, M, MacMillan, K, and Fujinaga, I (2003). The Gamera framework for building custom recognition systems. Symposium on Document Image Understanding Technologies: 275-86.This proceedings is difficult to obtain, but the paper is available in PDF on the Gamera website.
Installation
- I can't get Gamera to run.
First check the following:
- Make sure you have the correct version of Python installed. (This is 2.2.2 or greater on Linux, 2.3.1 or greater on Windows and 2.3.0 or greater on OS-X). Verify that it is installed correctly by running any of the demonstration scripts.
- Make sure you have the correct version of wxPython installed. Recent versions in the 2.5.x series are unstable development releases and are not supported by Gamera. You will need to visit the complete list of wxPython releases to download a 2.4.x version.
- If you are running Gamera on the commandline, try running the gamera_gui script from a directory other than the Gamera source directory.
If these things fail, please send a message on the mailing list. Include in your message the Python backtrace, the versions of Gamera, Python, wxPython and platform you are using.
- I just upgraded to Gamera 3.x and now I get all these deprecation warnings that I never used to see before.
There are some functions in Gamera 3.x that have been deprecated. They will continue to work until a future release, but you will receive warnings. See the migration guide for more information.
Writing code
- How do I write a Gamera script?
Gamera scripts are just Python scripts that import Gamera's modules. It is definitely a good idea to familiarise yourself with the basics of Python before diving in.
There are a number of really basic scripts to help get you started in the documentation .
- After classification, how do I get the results?
The classifier stores its classifications in the
id_namemember variable of images. Thisid_namemember is actually a list of possible classifications. See the id_name documentation for more information.When you pass a list into
classifier.classify_list_automaticorclassifier.group_list_automatic, the list itself is not modified. Instead, any glyphs that should be added or removed are returned in a tuple of lists(added, removed). Therefore, to get any glyphs that were newly created by either splitting or grouping, you have to do the following:added, removed = classifier.group_list_automatic(glyphs) glyphs += addedThere is also a convenience function `classifier.classify_and_update_list_automatic` which handles this for you.
- When should I use C++ and when should I use Python?
There's no straight answer here. This should be considered as a tradeoff between runtimes (always let benchmarking on real-world data determine which is better) and development time. That said, you usually won't want to go through the trouble of implementing something twice, so here is a useful rule of thumb:
- Algorithms that need access to individual pixels should be implemented in C++
- Algorithms that drive other long-running, low-level processes should be implemented in Python
Training
- What's the deal with production and current databases?
Obsolete question: As of October, 2004, the terminology of production and current databases has changed.
- production database is now classifier glyphs
- current database is now page glyphs
This, and some additions to the classifier GUI, should hopefully alleviate much of this confusion.
The page glyphs are simply the set of connected components on the page you are currently training. The classifier glyphs are the connected components that the classifier uses to make its classifications (i.e. the training data). They are documented here.
The classifier GUI provides some flexibility as to how these two databases are saved, loaded and merged.
- How do I train the classifier to group connected components together (such as for lower case i's)?
The classifier can be used to both repair broken characters and recognize "legitimately broken" characters. To train broken characters, select all parts of a single character and give the symbol name the prefix
_group.For example, to train lower case i's, select both the stem and dot of a single lower case i and train it as
_group.lower.i. - What's with id names?
Training is basically the act of assigning symbol names to characters so that the classifier can learn what things are. Symbol names in Gamera may contain Unicode characters, and can be delimited into categories using periods. There is deliberately no standard naming convention in Gamera: that will depend entirely on the type of document being trained. However, if your document type fits neatly into the textual types of documents supported by Unicode, you may want to use standard Unicode character names, if only to avoid reinventing the wheel.
- How can I make classification faster?
The first thing to look at is the set of features you're using. Gamera provides a large number of feature generation routines, some of which are rather computationally intensive. Try limiting the set of features to ones you think you'll really need.
You can decrease the time spent loading the training data into the classifier dramatically by using
classifier.serialize()to save it in a high-speed but non-portable binary format.