A Framework for Discriminative Rule Selection in Hierarchical Moses Fabienne Braune, Alexander Fraser, Hal Daume III, Ales Tamchyna WMT 2016 Training discriminative rule selection models is usually expensive because of the very large size of the hierarchical grammar. Previous approaches reduced the training costs either by (i) using models that are local to the source side of the rules or (ii) by heavily pruning out negative samples. Moreover, all previous evaluations were performed on small scale translation tasks, containing at most 250,000 sentence pairs. We propose two contributions to discriminative rule selection. First, we test previous approaches on two French-English translation tasks in domains for which only limited resources are available and show that they fail to improve translation quality. To improve on such tasks, we propose a rule selection model that is (i) global with rich label-dependent features (ii) trained with all available negative samples. Our global model yields significant improvements, up to 1 BLEU point, over previously proposed rule selection models. Second, we successfully scale rule selection models to large translation tasks but have so far failed to produce significant improvements in BLEU on these tasks.