In this assignment, we will build a state of the art phrase-based SMT system, tune it with minimum error rate training and run a test set to see the BLEU score.
We will replicate the shared task from the ACL 2008 Third Workshop on Statistical Machine Translation (WMT08). However, we will use a smaller data set so that the experiment runs quickly.
We will build the baseline system from WMT08 by following the directions here.
The instructions are written for French to English, but you may also build German to English. If you wish to build German to English, you need to substitute "de" for "fr" throughout the web page.
If you would like to build the system on your own Linux laptop or a remote Linux computer at your home institution, I recommend that you transfer the "wmt08_small" directory referred to below to your machine and use it for your experiments (renaming it to "wmt08"). Follow the directions from the top.
If you would like to build the system on IMS servers, where we have already performed the steps to install software, please follow these instructions:
Login to your account. Check which shell you are running by typing "echo $SHELL" at the command prompt. It should be "tcsh" (if it isn't and you don't know how to set an environment variable in your login script, please talk to me). Edit the file ".cshrc" in your home directory. Before the comment about "if this is not an interactive shell", put:
setenv SCRIPTS_ROOTDIR /mount/studenten2/statmt/bin/moses-scripts/scripts-20080723-1127
now logout and log back in. Check that SCRIPTS_ROOTDIR is properly set using "echo $SCRIPTS_ROOTDIR".
Next "cd /mount/studenten2/statmt/students"
Create a subdirectory called the same as your username, and cd to it.
run these commands:
ln -s ../../wmt08_small wmt08
ln -s ../../bin .
ln -s ../../scripts .
ln -s ../../moses .
Now follow the steps here, starting from Prepare Data (i.e., skipping the install steps).
Other comments (please refresh your browser to see latest comments):