CIS OCR Group
The CIS OCR group researches both mathematical methods of fast string search and alignment problems for later use in OCR postprocessing as well as practical solutions and complete workflows for the effective and highly accurate transformation of images of printed documents into electronic text over the whole history of modern printing (starting with Gutenberg 1450).
Recent breakthrough innovations include the application of recurrent neural networks with LSTM architecture to the OCR of early printings, making it possible for the first time to extract readable electronic text from incunabula printings (1450-1500), and the development of an interactive post correction tool (PoCoTo) for the fast correction of OCR results of historical printings allowing the discrimination between historical spellings and true OCR errors. The underlying language profiling technology as well as the PoCoTo Java desktop client developed at CIS are now publicly available for German, Latin and Ancient Greek as a result of the CLARIN-D project "Ausbau und Erweiterung eines Open-Source-Tools zur Nachkorrektur historischer OCR-erfasster Texte".
Klaus U. Schulz