Research activities of Prof. Klaus U. Schulz
The work of Prof. Schulz concentrates on
- Semantic Search, Construction of Ontologies and Taxonomies
- Digital Libraries, Language Technology for Optical Character
Recognition and Document Analysis
- Finite-State Technology
Projects and Spin-offs:
In the i2010 vision of a European Digital Library, the EU launched an ambitious
plan for large scale digitisation projects
transforming Europe’s printed heritage into digitally available resources.
aim of fully integrating intellectual content into the modern information and
communication technologies environment can only be achieved by full-text
digitisation: transforming digital images of scanned books into electronic
Over the last 2-3 years mass-digitisation has become one of the most prominent
issues in the library world. Today, a number of advanced libraries in Europe
are scanning millions of pages each year and large scale-digitisation is a
matter of fact, not a vision any more. However, these efforts can tackle only a
fraction of the total heritage available in cultural memory organisations. The
digitised material is becoming available too slowly and in too small quantities
from too few sources, for three reasons.
There is a lack of institutional knowledge and expertise which causes inefficiency and ‘re-inventing the wheel’. This is a problem for the vast majority of libraries, museums and archives in Europe.
The costs for full-featured electronic text of historical documents are
much too high. Cultural heritage institutions will not be able to satisfy the
needs of their users for electronic texts instead of pure digital
images. Manual keying costs around 1 EUR per page, so that a typical book sums
up to 400, 500 or even 1000 EUR.
- Automated text recognition, carried out by Optical Character Recognition (OCR) engines does in many cases not produce satisfying results for historical documents. Recognition rates are poor or even useless. No commercial or other OCR engine is able to cope satisfactorily with the wide range of printed materials published between the start of the Gutenberg age in the 15th century and the start of the industrial production of books in the middle of the 19th century.
IMPACT as a network of centres of competence brings together 22 national and regional libraries, research institutions and commercial suppliers.
Zoom from information to knowledge.
The intelligent access to text published in vast Internet media, in
archives, and document collections in general is more and more one of the
decisive factors for success. People in their roles as private members of
communities and in their role as professionals need to gather knowledge
in an efficient way.
On the other hand, companies, public institutions
and organizations of any kind have to present their materials in an
approachable form. The semantic technologies developed by the team of TopicZoom
provide a solution for these demands of the information age that
goes far beyond the technologies of classic full-text search.
TopicZoom is a LMU spin-off founded by members of the Center of
Information and Language Processing (CIS).