Erweiterungsmodul: Machine Translation

Summary

Machine Translation

In the first part, the general problem of machine translation (automatic translation of text from one language to another) will be discussed, as well as the history of research into machine translation. We will then briefly consider older approaches to machine translation (before the current focus on machine learning). Then, some particular challenges for natural language processing that must be solved on the way to general approaches for machine translation will be presented. Finally, we will discuss the important topic of evaluation of machine translation systems.

In the second part, we will look at statistical machine translation (SMT), which became the dominant paradigm in translation from about 2000 to 2015, and is still the core of many industrial systems. The related concepts of translational equivalence (established through word alignment), simple statistical models and search algorithms will be introduced.

In the third and last part of the lecture, we will consider the deep learning approaches used in so-called neural machine translation (NMT). We will briefly introduce the concepts of word embeddings and deep learning before moving on to provide a high-level overview of recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) approaches to translation, and then follow up with the state-of-the-art Transformer approach, and talk about transfer learning (with applications beyond NMT).

Goals

Theoretical understanding of the challenges of machine translation and the models used to solve them.

Machine Translation Exercises (Übung)

Goals

Practical experience in solving sub-problems of machine translation, as well as familiarity with the data used for training statistical models.

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich


Schedule



Tuesdays, 16 to 18 (c.t.). ONLINE WITH ZOOM. Link will be sent to students listed in LSF.

Wednesdays, 14 to 16 (c.t.). ONLINE WITH ZOOM. Link will be sent to students listed in LSF.

For the video, if there is a mov file, then it is the "trimmed" one (dead space at beginning and end removed). This will be larger than the original due to encoding issues. But in this case I include the original mp4 in case of problems or if you want a smaller file.

Date Topic Reading (DO AFTER THE MEETING!) Slides Video
April 21st Orientation and Introduction to Machine Translation pdf mov mp4
April 22nd Introduction to Statistical Machine Translation ppt pdf mp4
April 28th Bitext alignment (extracting lexical knowledge from parallel corpora) ppt pdf mp4
April 29th at 15:15 (not 14:15) Bitext alignment continued Optional: read about Model 1 in Koehn and/or Knight (see below) mp4
May 5th Many-to-many alignments and Phrase-based model ppt pdf mp4
May 6th Log-linear model and Minimum Error Rate Training
ppt pdf mp4
May 12th Decoding pdf mp4
May 13th Linear Models pptx pdf mp4
May 19th Neural Networks (and Word Embeddings) pdf mp4
May 20th Bilingual Word Embeddings and Unsupervised SMT (Viktor Hangya) pdf mp4
May 26th Training and RNN/LSTMs (Denis Peskov) pdf mp4
May 27th Encoder-Decoder and Attention (Jindřich Libovický) pdf mp4
June 2nd Pfingstdienstag (holiday)
June 9th Transformer, Document and Unsupervised NMT (Dario Stojanovski) pdf mp4
June 10th Transfer Learning for Unsupervised NMT (Alexandra Chronopoulou) pdf mp4
June 16th Linguistic Information in Machine Translation (Marion Di Marco) pdf mp4
June 17th Operation Sequence Model and OOV Translation 14_part1_OSM.pdf 14_part2_OOV.pdf mp4
June 23rd Exercise 1 Released. Due Tuesday Jun 30th at 15:00. exercise1.txt mp4
June 24th Office hours
June 30th Exercise 2 Released. Due Tuesday July 14th at 15:00. exercise2.html mp4
July 1st Office hours
July 14th Exercise 3 Released. Due Tuesday July 21st at 15:00. exercise3.pdf mp4
July 15th Office hours (starting at 15:15, not 14:15)
July 21st Dry run for Zoom Exam. Exercise 4 Released. Due *MONDAY* July 27th at 15:00. trial_exam.doc
exercise4.pdf (CORRECTED!) DUE *MONDAY*
mp4
July 22nd Office hours (please send an email!)
July 27th, MONDAY at 16:15 Comments on Exercise 4. Office hours.
Please use the Tuesday URL!
mp4
July 28th Exam (on Zoom!).
Please use the usual Tuesday URL!
exam.doc



Literature:

Philipp Koehn's book Statistical Machine Translation

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)