Towards Perfect Supervised and Unsupervised Machine Translation

Talk on 2020-07-03 at the Faculty for Mathematics, Informatics and Statistics, LMU Munich

Date: Friday July 3rd, 15:00 (s.t.)

Prof. Dr. Alexander Fraser
Professor für Informations- und Sprachverarbeitung
Centrum für Informations- und Sprachverarbeitung
Ludwig-Maximilians-Universität München

Title: Towards Perfect Supervised and Unsupervised Machine Translation

Location: On Zoom

45 Minute Talk plus Questions fraser.mp4
Powerpoint Slides fraser.ppt


Data-driven Machine Translation is an interesting application of machine-learning-based natural language processing techniques to multilingual data. Particularly with the recent advent of powerful neural network models, it has become possible to incorporate many types of information directly into the model and to robustly model long-distance dependencies in the sequence of words being generated.

I will discuss four areas of work addressing important weaknesses of data-driven machine translation approaches. First, I will present an alternative model to phrase-based statistical machine translation, which jointly models translation operations and reordering operations and was widely adopted by researchers and end-users. Second, I will discuss the important problem of data sparsity in translation which is caused by rich morphology, and discuss extensive work we have carried out to overcome this. Third, I will discuss progress towards breaking the strong domain dependency between the data used to train supervised neural machine translation systems and the data that will be translated. Finally, I will briefly present a new research program which will allow us to build strong unsupervised machine translation systems, enabling the carrying out of high quality translation between pairs of languages for which no known source of parallel training data exists.


Alexander Fraser is an associate professor at the Center for Information and Language Processing, LMU Munich, where he leads the statistical machine translation group. Before that, he led the statistical machine translation group at the Institute for Natural Language Processing at the University of Stuttgart.

He holds an ERC Starting Grant and was PI of the Horizon 2020 "Health in my Language" project. He was PI of a German Research Foundation project on modeling morphosyntactic phenomena in machine translation, and deputy PI in the FP7 project, "TTC - Terminology Extraction, Translation Tools and Comparable Corpora". His main research interests are in machine learning based and hybrid approaches to machine translation, syntactic parsing and information retrieval.

Alex obtained his PhD in 2007 from the Department of Computer Science at the University of Southern California. His PhD research was conducted at the Information Sciences Institute in the Intelligent Systems Division. In addition to his academic work, Alex has worked at Language Weaver, where he developed the first commercially-available statistical machine translation system, and at BBN Technologies, where he worked on Arabic monolingual and cross-language information retrieval. Before starting the PhD, Alex held a number of positions including technical director at SatelLife/HealthNet, working on digital infrastructure for health information in developing countries.