OCoRrect is an international project funded by the Volkswagenstiftung. Project partners are the Bulgarian Academy of Science and the LMU Munich. The principal concern of this project is the improvement of methods for post-correction of OCR-results. On the application side, special emphasis is given to conversion of documents in Cyrillic or with mixed Cyrillic and Latin languages and alphabets. From a methodological point of view, the use of large-scale mono- and multilingual electronic dictionary systems for OCR correction is a central concern. Subsequently, a new method for string correction with large electronic dictionaries developed only recently by the project partners will be further extended and optimized. During the project a corpus of Bulgarian and German OCR-documents will be collected and edited for research and evaluation purposes.
Keywords: OCR, document analysis and recognition, postprocessing, postcorrection, multilingual documents, lexicon automata.