Statistical Machine Translation

Introduction

The success of statistical machine translation systems such as Moses, Language Weaver, Google Translate and many others, has shown that it is possible to build high performance machine translation systems with a small amount of effort using statistical learning techniques.

After several introductory lectures we will alternate informal presentations of research papers by members of the group. Our initial goal is to reach the point where we are able to read about and discuss new ideas in statistical machine translation research involving the integration of linguistic representations ranging from deep to shallow.

The language of the course is English.

People

Alexander Fraser and Andreas Maletti

Schedule

We meet at 15:45 on Fridays, in room 12.21.

The first lecture was on October 28th at 15:45 in room 12.21. The course began with a review of basic concepts of probability. Here are the slides.

A good reference on probability is:
Grinstead, Snell. Introduction to Probability. AMS 1997. A PDF (distributed under the GNU Free Documentation License) is available at: http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/book.html
Note that this is intended as a reference, not required reading.

A good reference on statistical machine translation (not required) is: Philipp Koehn, Statistical Machine Translation.

The lecture on November 4th was an introduction to statistical machine translation. Here are the slides: powerpoint PDF

The lecture on November 11th was on word alignment (IBM Model 1). Here are somewhat *modified* slides: powerpoint PDF. You might be interested in Kevin Knight's excellent tutorial on word alignment (not required).

The lecture on November 18th was on phrase-based modeling. Here are the slides: powerpoint PDF

The lecture on November 25th will cover decoding (see the previous slide set). We may also get to log-linear models and minimum error rate training. Here are the slides for that: powerpoint PDF

The lecture on December 9th will continue log-linear models and minimum error rate training. I will also discuss some of these slides on linguistic processing for statistical machine translation (primarily the ones on morphology and parser-based reordering): powerpoint PDF

We completed non-hierarchical translation, subsequent lectures will cover hierarchical (syntactic) SMT, and will be presented by Andreas Maletti.

The lecture on December 16th was an introduction to the basics of syntactic MT, here are the slides.

The lecture on January 13th was on rule extraction and training for tree transducers, here are the slides.

The lecture on January 20th was on decoding, here are the slides.

On January 27th we had a guest lecture from Daniel Quernheim. It was on semantic-based statistical machine translation. Here are the slides.

On February 3rd, we had a review session for the test.

ANNOUNCEMENT: both Andreas Maletti's project Tree Transducers in Machine Translation and Alex Fraser's project Models of Morphosyntax for Statistical Machine Translation are hiring HiWis.

There is no class on February 10th.

The test will be on Tuesday 21.02 at 10:30 in 12.21 - please email us if you have any questions about registering for the test.