Erweiterungsmodul: Machine Translation

Summary

Machine Translation

In the first part, the general problem of machine translation (automatic translation of text from one language to another) will be discussed, as well as the history of research into machine translation. We will then briefly consider older approaches to machine translation (before the current focus on machine learning). Then, some particular challenges for natural language processing that must be solved on the way to general approaches for machine translation will be presented. Finally, we will discuss the important topic of evaluation of machine translation systems.

In the second part, we will look at statistical machine translation (SMT), which became the dominant paradigm in translation from about 2000 to 2015, and is still the core of many industrial systems. The related concepts of translational equivalence (established through word alignment), simple statistical models and search algorithms will be introduced. Finally, linguistically focused modelling extensions required for difficult language pairs (e.g., English to German translation) will be presented.

In the third and last part of the lecture, we will consider the deep learning approaches used in so-called neural machine translation (NMT), a very new but extremely popular technique. We will briefly introduce the concepts of word embeddings and deep learning before moving on to provide a high-level overview of recurrent neural networks (RNNs) and state-of-the-art Long Short-Term Memory (LSTM) approaches to translation.

Goals

Theoretical understanding of the challenges of machine translation and the models used to solve them.

Machine Translation Exercises (Übung)

The first half of the exercises will consider practical problems of machine translation with a special focus on English to German translation.

The second half of the exercises will be practical projects carried out by the students (required for obtaining a grade).

Goals

Practical experience in solving sub-problems of machine translation, as well as familiarity with the data used for training statistical models.

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich

Schedule

Lecture (Vorlesung) Wednesdays, Room 131, 14 to 16 (c.t.)

Exercise (Übung) Tuesdays, 16 to 18 (c.t.), NEW: IN KALAHARI (was previously in C003)

Date Topic Reading (DO BEFORE THE MEETING!) Slides

April 26th Orientation and Introduction to Machine Translation pdf

May 3rd Introduction to Statistical Machine Translation ppt pdf

May 9th TÜ Google Translate and Manual Word Alignment exercise1.txt

May 10th Bitext alignment (extracting lexical knowledge from parallel corpora) ppt pdf

May 16th TÜ Translation Memory and IBM Model 1 exercise2.html

May 17th Many-to-many alignments and Phrase-based model ppt pdf

May 23rd Log-linear model and Minimum Error Rate Training
ppt pdf
projects_fraser.pdf
projects_braune.pdf
projects_huck.pdf

May 24th TÜ in room 131!

May 30th TÜ: Phrase extraction, etc.

May 31st Decoding (Matthias Huck) pdf

June 7th Linear Models and Discriminative Phrase Lexicons in Moses pptx pdf
tamchyna_acl_2016_slides.pdf tamchyna_acl_2016_slides.pptx

Tuesday, June 13th at 12:15 in L155 Special Talk: An Analysis of Neural Machine Translation and Combination with Statistical Machine Translation (Jan Niehues, Karlsruhe)

June 13th TÜ

June 14th Neural Networks (and Word Embeddings), Fabienne Braune pdf (UPDATED!)

June 20th TÜ: Projects notes assignments (UPDATED!)

June 21st SMT: Advanced Word Alignment, Morphology, Syntax ppt pdf

June 27th TÜ: Projects

June 28th Bilingual Word Embeddings and Recurrent Neural Networks, Fabienne Braune pdf

July 4th TÜ: Projects

July 5th Neural Machine Translation, Matthias Huck pdf

July 11th TÜ: Exercise Deep Learning

July 12th Office Hours in 131

July 18th in ***C003*** Presentations in C003 Presentation Dates (UPDATED WITH DETAILS!)

July 19th Presentations in 131

July 25th in ***C003*** Presentations in C003

July 26th Presentations in 131

Literature:

Philipp Koehn's book Statistical Machine Translation

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)

Date	Topic	Reading (DO BEFORE THE MEETING!)	Slides
April 26th	Orientation and Introduction to Machine Translation		pdf
May 3rd	Introduction to Statistical Machine Translation		ppt pdf
May 9th	TÜ Google Translate and Manual Word Alignment		exercise1.txt
May 10th	Bitext alignment (extracting lexical knowledge from parallel corpora)		ppt pdf
May 16th	TÜ Translation Memory and IBM Model 1		exercise2.html
May 17th	Many-to-many alignments and Phrase-based model		ppt pdf
May 23rd	Log-linear model and Minimum Error Rate Training		ppt pdf projects_fraser.pdf projects_braune.pdf projects_huck.pdf
May 24th	TÜ in room 131!
May 30th	TÜ: Phrase extraction, etc.
May 31st	Decoding (Matthias Huck)		pdf
June 7th	Linear Models and Discriminative Phrase Lexicons in Moses		pptx pdf tamchyna_acl_2016_slides.pdf tamchyna_acl_2016_slides.pptx
Tuesday, June 13th at 12:15 in L155	Special Talk: An Analysis of Neural Machine Translation and Combination with Statistical Machine Translation (Jan Niehues, Karlsruhe)
June 13th	TÜ
June 14th	Neural Networks (and Word Embeddings), Fabienne Braune		pdf (UPDATED!)
June 20th	TÜ: Projects		notes assignments (UPDATED!)
June 21st	SMT: Advanced Word Alignment, Morphology, Syntax		ppt pdf
June 27th	TÜ: Projects
June 28th	Bilingual Word Embeddings and Recurrent Neural Networks, Fabienne Braune		pdf
July 4th	TÜ: Projects
July 5th	Neural Machine Translation, Matthias Huck		pdf
July 11th	TÜ: Exercise Deep Learning
July 12th	Office Hours in 131
July 18th in *C003*	Presentations in C003		Presentation Dates (UPDATED WITH DETAILS!)
July 19th	Presentations in 131
July 25th in *C003*	Presentations in C003
July 26th	Presentations in 131