WPCom 1: Lexicon, Syntax, Semantics: WSD and MT

Summary

Word Sense Disambiguation (WSD) and Machine Translation (MT) are two key problems of natural language processing where the role of the lexicon is critical. While there are many different inventories of word senses for a particular language, it is clear that a minimal set of word senses can be defined by looking at translations into other languages (which are not synonyms).

Content:

The seminar will begin with the basics of Statistical Machine Translation and Word Sense Disambiguation, and then look at attempts to use approaches taken from the WSD literature in MT.

Goals:

The goal of the seminar is to understand the basics of MT, WSD and in particular the important role of the lexicon in both of these problems.

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich

DFG Project: Models of Morphosyntax for Statistical Machine Translation


Schedule


Room L155, Tuesdays, 16:00 to 18:00 (c.t.)


Date Topic Reading (DO BEFORE THE MEETING!) Slides
October 7th Organizational Meeting, Personal Information, Orientation Test
October 14th Introduction to Statistical Machine Translation powerpoint pdf
October 21st Bitext alignment (extracting lexical knowledge from parallel corpora) powerpoint pdf
October 28th Many-to-many alignments and Phrase-based Translation Modeling (also, Referat!) powerpoint pdf
November 4th Decoding powerpoint pdf
November 4th and 11th Log-linear model and Minimum Error Rate Training powerpoint pdf
November 11th SMT: Lexical Choice and Morphological Sparsity powerpoint pdf
November 18th Introduction to Word Sense Disambiguation powerpoint pdf
November 25th Referat + Introduction to Linear Models Navigli, Sections 1 and 2 powerpoint pdf
December 2nd Referat Navigli, Sections 3 and 5
December 9th Referat + More Linear Models (see Nov 25th)
December 16th Machine Learning LAB! We will be in the *Gobi* computer lab assignment CMU Seminars dataset tar file with scripts(UPDATED) unigram_bigram_pattern.txt(NOW WITH COMMENTS) wapiti



Referatsthemen (name: topic)


Date Topic Materials Hausarbeit Received
November 25th Wurst: Supervised WSD yes
December 2nd Chebib: Dictionary-based Disambiguation yes
December 9th Eder: Unsupervised WSD yes
January 13th CANCELLED due to presenter health reasons
January 20th Schätz: Project Cross-Lingual Lexical Substitution yes
January 20th Kalasouskaya: Project Supervised WSD yes
January 27th Deyringer: Project Wikification yes
January 27th Wunderlich: Project WSD of Old English yes


Literature:

Philipp Koehn's book Statistical Machine Translation

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)

Roberto Navigli's tutorial on WSD (here is a local copy)