A Framework for Discriminative Rule Selection in Hierarchical Moses
Fabienne Braune, Alexander Fraser, Hal Daume III, Ales Tamchyna
WMT 2016

Training discriminative rule selection models is usually expensive
because of the very large size of the hierarchical grammar. Previous
approaches reduced the training costs either by (i) using models that
are local to the source side of the rules or (ii) by heavily pruning
out negative samples. Moreover, all previous evaluations were
performed on small scale translation tasks, containing at most 250,000
sentence pairs. We propose two contributions to discriminative rule
selection. First, we test previous approaches on two French-English
translation tasks in domains for which only limited resources are
available and show that they fail to improve translation quality. To
improve on such tasks, we propose a rule selection model that is (i)
global with rich label-dependent features (ii) trained with all
available negative samples. Our global model yields significant
improvements, up to 1 BLEU point, over previously proposed rule
selection models. Second, we successfully scale rule selection models
to large translation tasks but have so far failed to produce
significant improvements in BLEU on these tasks.