University of Stuttgart - Statistical Machine Translation Reading Group

Invitation

The success of statistical machine translation systems such as Moses, Language Weaver and Google Translate has shown that it is possible to build high performance machine translation systems with a small amount of effort using statistical learning techniques.

We are organizing a reading group on statistical machine translation (including work on statistical parsing). The intended audience is wide, including students and researchers in the areas of computational linguistics, linguistics, natural language processing, artificial intelligence and machine learning; everyone is invited.

The language of the reading group is English.

After several introductory lectures we will alternate informal presentations of research papers by members of the group. Our initial goal is to reach the point where we are able to read about and discuss new ideas in statistical machine translation research involving the integration of linguistic representations ranging from deep to shallow.

Organizers

The reading group was organized by Alex Fraser and Helmut Schmid from 2008 to 2010, Alex Fraser is currently organizing it.

Email Address: SubstituteLastName@ims.uni-stuttgart.de

University of Stuttgart

Institute for Natural Language Processing (IMS/IfNLP)

SFB 732 - Incremental Specification in Context

Schedule

LOCATION: We will have a single meeting on November 2nd in the IMS phonetics lab, 3.11 (top floor, last room on the right, Institut fuer Maschinelle Sprachverarbeitung, Azenbergstrasse 12, Stuttgart).

Try building your own Moses system !

Future and Present

2011

November 2nd, 16:00, 3.11 Hassan Sajjad: IJCNLP practice talk: Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus
TBD Fabienne Braune: Markos Mylonakis and Khalil Sima'an. Learning Hierarchical Translation Structure with Linguistic Annotations. ACL-HLT 2011. paper
TBD TBD: Libin Shen, Jinxi Xu and Ralph Weischedel. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. ACL 2008, outstanding paper award. paper, see also slides


Past

2011

August 3rd, 10:30, room 3.11 Thomas Schoenemann Regularizing Word Alignment. For a full abstract, click here.
July 13th, 14:30, 12.21 Nadir Durrani: M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer. Scalable Inference and Training of Context-Rich Syntactic Models. ACL-COLING 2006. paper
Nadir's slides: pdf ppt
June 15th, 14:30, 12.21 ACL Practice Talks. Nadir Durrani: A Joint Sequence Translation Model with Integrated Reordering, Andreas Maletti: How to train your multi bottom-up tree transducer Durrani paper Maletti paper
June 8th, 14:30, 12.21 Hassan Sajjad: David Chen and William Dolan. Collecting Highly Parallel Data for Paraphrase Evaluation. ACL 2011. paper
June 1st, 14:30, 12.21 Anita Gojun: Dmitriy Genzel. Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation. COLING 2010. paper
May 18th, 14:30, 12.21 Marion Weller: Beatrice Daille, Emmanuel Morin. Effective Compositional Model for Lexical Alignment. IJCNLP 2008. paper
May 11th, 14:30, 12.21 Daniel Quernheim: Michel Galley, Mark Hopkins, Kevin Knight, Daniel Marcu. What's in a translation rule? NAACL 2004. paper
Daniel Q's slides
April 20th, 10:30, 12.21 Alex Fraser: Andreas Zollmann, Ashish Venugopal, Franz Och and Jay Ponte. A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT. COLING 2008. paper
April 13th, 10:30, 12.21 Anita Gojun: Rule-based and lattice-based approaches for determining the placement of German verbs in English to German SMT
April 6th, 10:30, 12.21 Daniel Quernheim: Hyper-minimisation of weighted finite automata
March 30th, 10:30, 12.21 Nadir Durrani: A Joint Sequence Translation Model with Integrated Reordering
March 23rd, 10:30, 12.21 Fabienne Braune: Spence Green, Michel Galley, Christopher D. Manning. Improved Models of Distortion Cost for Statistical Machine Translation. NAACL 2010. paper see also slides/data/code
March 16th, 10:30, 12.21 Alex Fraser: Morphological Generation of German for Statistical Machine Translation. (Joint work with Marion Weller, Aoife Cahill, Fabienne Cap, in the DFG project Models of Morphosyntax for SMT)
March 2nd, 10:30, 12.21 Andreas Maletti: Tree Transducers in Machine Translation. For a full abstract, click here.

2010

November 9th, 14:00, 3.11 Alex Fraser: Introduction to statistical machine translation - Part 4. Log-linear models for SMT (this is a repeat with some improvements, you should be familiar with the phrase-based model described at the beginning of lecture 3, see lecture 3 slides below) slides
August 11th, 14:00, 3.11 Fabienne Fritzinger: Kristina Toutanova, Hisami Suzuki, and Achim Ruopp. Applying Morphology Generation Models to Machine Translation. ACL 2008. paper poster
August 4th, 14:00, 3.11 Fabienne Braune: Steve DeNeefe and Kevin Knight. Synchronous Tree Adjoining Machine Translation. EMNLP 2009. paper
July 28th, 14:00, 3.11 Fabienne Braune: Anders Søgaard and Jonas Kuhn. Empirical lower bounds on alignment error rates in syntax-based machine translation. SSST 2009. paper
July 21st, 14:00, 3.11 Helmut Schmid: Michel Galley and Christopher D. Manning. Accurate Non-Hierarchical Phrase-Based Translation. NAACL 2010. paper
July 7th, 14:00, 3.11 Nadir Durrani, Fabienne Fritzinger: ACL practice talks
June 30th, 14:00, 3.11 Alex Fraser: Introduction to statistical machine translation - Part 5. Advanced topics in SMT. Discriminative bitext alignment, morphological processing, syntax slides
June 23rd, 14:00, 3.11 Nadir Durrani: Michel Galley, Christopher Manning. A Simple and Effective Hierarchical Phrase Reordering Model. EMNLP 2008. paper
June 16th, 14:00, 3.11 Fabienne Fritzinger: Philipp Koehn, Franz Josef Och, Daniel Marcu. Statistical Phrase-Based Translation. HLT-NAACL 2003. paper
June 9th, 14:00, 3.11 Patrick Leucht: Robert C. Moore. Fast and Accurate Sentence Alignment of Bilingual Corpora. AMTA 2002. paper
June 2nd, 14:00, 3.11 Fabienne Braune: Michael Collins and Philipp Koehn and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. ACL 2005. paper
May 19th, 14:00, 3.11 Alex Fraser: Introduction to statistical machine translation - Part 4. Log-linear models for SMT slides
May 12th, 14:00, 3.11 Alex Fraser: Introduction to statistical machine translation - Part 3. Decoding (automatically translating a text given an already learned model) slides
May 5th, 14:00, 3.11 Alex Fraser: Introduction to statistical machine translation - Part 2. Bitext alignment (extracting lexical knowledge from parallel corpora) slides
Reading: Kevin Knight's SMT Tutorial (concentrate on Model 1)
If you have time: Implement Model 1!
April 28th, 14:00, 3.11 (IMS Phonetik Labor) Alex Fraser: Introduction to statistical machine translation - Part 1. Introduction, basics of statistical machine translation (SMT), evaluation of MT slides

2009

Wed July 22nd, 15:45, 3.11 (IMS Phonetik Labor) Alex Fraser: David Chiang. A hierarchical phrase-based model for statistical machine translation. ACL 2005 (best paper) paper
Wed July 8th, 15:45, 3.11 (IMS Phonetik Labor) Alex Fraser: Christoph Tillmann. A Unigram Orientation Model for Statistical Machine Translation. HLT-NAACL 2004 short paper. paper
June 4th, 10:30, Office Hans Kamp Fabienne Braune: Dekai Wu. A polynomial-time algorithm for statistical machine translation. ACL 1996. paper (ps) (pdf)
May 28th, 9:45-11:15, 12.21 Im Rahmen des Hauptseminars Maschinelle Übersetzung I (Heid), spricht PD Dr. Kurt Eberle (Heidelberg/Stuttgart): "Aktuelle Architekturfragen in der Maschinellen Übersetzung: semantischer Transfer und Integration statistischer Information in 'translate'"
May 14th, 10:30 Alex Balabanov: Kenji Yamada and Kevin Knight. A syntax-based statistical translation model. ACL 2001. paper
May 7th, 10:30 Hassan Sajjad: Yaser Al-Onaizan and Kevin Knight. Translating Named Entities Using Monolingual and Bilingual Resources. ACL 2002. paper
April 30th, 10:30 Hassan Sajjad, Alex Fraser: EACL 2009 report (interesting papers), organizational meeting
March 26th, 10:30 Aoife Cahill, Alex Fraser, Hassan Sajjad: Practice talks for EACL Papers: Cahill Fraser1 Fraser2 Sajjad
March 19th, 10:30 Helmut Schmid: Liang Huang. Forest Reranking: Discriminative Parsing with Non-Local Features. ACL 2008 (1 of 2 outstanding paper awards). paper
March 5th, 10:30 Alex Fraser: Chris Quirk, Arul Menezes, Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. ACL 2005.
Part II: decoding, experiments, discussion.
Feb 26th, 10:30 Our first paper on a non-preprocessing approach to syntactic SMT!
Alex Fraser: Chris Quirk, Arul Menezes, Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. ACL 2005.
Part I: model and training.
paper
Feb 19th, 10:30 Two papers on preprocessing approaches for coping with composita and rich inflection:
Fabienne Fritzinger: Empirical Methods for Compound Splitting. Philipp Koehn and Kevin Knight. EACL 2003.
Alex Fraser: Improving Statistical MT Through Morphological Analysis. Sharon Goldwater and David McClosky. EMNLP 2005.
composita paper inflection paper
Feb 12th, 10:30 Hassan Sajjad: Michael Collins and Philipp Koehn and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. ACL 2005. paper
Feb 5th, 10:30 Alex Fraser: Franz Josef Och. Minimum Error Rate Training for Statistical Machine Translation. ACL 2003. paper
Jan 22nd, 10:30 Amit Dubey: Franz Josef Och, Hermann Ney. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. ACL 2002 (best paper). paper

2008

Dec 18th, 10:30 Amit Dubey: Hoifung Poon and Pedro Domingos. EMNLP 2008. Joint Unsupervised Coreference Resolution with Markov Logic. paper
Dec 11th, 10:30 Amit Dubey: Richardson and Domingos. Machine Learning, 62, 107-136, 2006. Markov Logic Networks. paper
Dec 4th, 10:30, IMS Mitarbeiter Zimmer Amit Dubey: Agirre, Baldwin and Martinez. ACL 2008. Improving Parsing and PP Attachment Performance with Sense Information Discriminative Reranking for Natural Language Parsing. paper
Nov 27th, 10:30, IMS Mitarbeiter Zimmer Aoife Cahill: Michael Collins. Discriminative Reranking for Natural Language Parsing. ICML 2000. paper
Nov 20th, 10:30, IMS Mitarbeiter Zimmer Alex Balabanov: Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. ACL/EACL 1997. You might also be interested in the slides for this paper or the longer Computational Linguistics journal paper (see Michael Collins' homepage) paper
Nov 13th, 10:30, IMS Mitarbeiter Zimmer Nadir Durrani: Statistical Phrase-Based Translation (HLT-NAACL 2003). Philipp Koehn, Franz Josef Och, Daniel Marcu Statistical Phrase-Based Translation
Nov 6th, 10:30, IMS Mitarbeiter Zimmer Alex Fraser: BLEU: a Method for Automatic Evaluation of Machine Translation (ACL 2002). Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu BLEU paper
October 23rd, 10:30, 3.11 (IMS Phonetik Labor) Christian Scheible: Introduction to Language Modeling. For slides and reference list, please click here. Chen and Goodman LM tutorial (focus on interpolation and Kneser/Ney smoothing)
October 9th, 10:30, 12.21 Martin Forst: Grammatical Machine Translation II. Martin Forst will discuss recent work on hybrid MT using LFG. For a full abstract, click here.
October 2nd, 10:30, 12.21 Helmut Schmid: PCFG parsing algorithms continued No required reading.
September 25th, 10:00, 12.21 Helmut Schmid: PCFG parsing algorithms No required reading.
August 14th, 10:00, 12.21 (IMS lecture hall) Helmut Schmid: Introduction to CFG parsing algorithms No required reading.
August 7th, 10:00, 12.21 (IMS lecture hall) Helmut Schmid: Introduction to HMM tagging No required reading. Manning and Schuetze HMM Chapter recommended.
July 31st, 10:00, 12.21 (IMS lecture hall) Alex Fraser: Introduction to statistical machine translation - Part 3, phrase-based modeling and decoding no required reading
July 21st to July 25th EMA Summer School, website is here. First two SMT lectures will be repeated on Tuesday, along with a practice assignment (implementing IBM Model 1). The lecture from the next reading group meeting (phrase-based modeling and decoding) will be on Wed at 14:00, followed by a practice assignment on decoding. Thursday morning's lecture will consist of a discussion of the assignments and a brief overview of some more advanced topics.
July 17th, 10:00, 3.11 (IMS Phonetik Labor) Alex Fraser: Introduction to statistical machine translation - Part 2. Bitext alignment (extracting lexical knowledge from parallel corpora) Kevin Knight's SMT Tutorial
July 10th, 10:00, 3.11 (IMS Phonetik Labor) Alex Fraser: Introduction to statistical machine translation - Part 1. I will define the MT problem and talk about evaluation. I will also discuss parallel corpora and sentence alignment and give a brief overview of statistical machine translation (SMT). Kevin Knight's tutorial is recommended, but not necessary until next week. Kevin Knight's SMT Tutorial