Understanding my research (for non-researchers)

Taken broadly, my research area constitutes the field of Computational Linguistics, which is also called Natural Language Processing. In Computational Linguistics the focus is on building computer systems which can process human language. Automatic speech recognition and internet search engines are two examples of the successful application of theories of Computational Linguistics to important problems.

Computational Linguistics is an interdisciplinary area that lies at the intersection of two fields, Computer Science and Linguistics. With respect to Computer Science, the main area of study is computational models of high complexity. For instance, in the problem of Information Retrieval, a document can be thought of as a vector in a high-dimensional space (where the dimensions either correspond to individual words or correspond to loose collections of words representing ideas). From Linguistics, Computational Linguistics draws knowledge of how human languages vary and ideas about how humans process language, which are critical to implementing computer systems which can operate on, or even attempt to understand, human language.

My specific research interests are the two related problems of Machine Translation and Cross-Language Information Retrieval. Machine Translation is the problem of using an automatic computer system to translate text from one human language to another, for instance from English to German. Machine Translation is a very difficult problem which has been the subject of a great deal of active research since the 1950s.

Cross-Language Information Retrieval is the creation of document searching systems (such as Internet search engines) where a query is given in one language, and the documents which are searched are in another language. For instance, the query might be given in English, while the documents searched are in German. Technology of this type might allow an American lawyer to search German court cases without needing to know German. In combination with a cost-effective and accurate Machine Translation system, it is easy to see how an individual could use a collection of documents that are written in a language that she/he does not speak. There is tremendous demand for both of these technologies, with the availability of documents in many different languages on the Internet being a key factor in driving this demand.