Alex Fraser and Costanza Conforti In this exercise we will first look at Google Translate's capabilities for German to English. We will then look at the difficulty of performing manual word alignment. Follow all of these steps: 1) Create 10 *short* sentences in German that Google Translate can translate to English. The first 5 sentences should be sentences that Google Translate can translate correctly. The second 5 sentences should be similar to the first sentences, but Google Translate should translate them incorrectly. Note that I want you to pick sentences where Google Translate *does* know all the words, but where the output English is wrong. Save the 10 source language sentences, and the 10 target language sentences into a file in a software you trust like Open Office or Microsoft Word. IMPORTANT: try to analyze what went wrong in each of the 5 sentences that were incorrect. Be able to say if you have problems like picking the wrong word sense of a polysemous word, or word ordering problems, or other problems. Be a explicit as possible, and make clear what source language word you are talking about (with an English translation). 2) *In the Google Translate interface* (THIS IS IMPORTANT!), fix the English output for the 5 bad sentences (when you click on different parts of the English output, you will see that it offers suggestions of how to fix things; or you can just type over the bad translation). You can also shift click to drag words. Write down any problems you have with this correction interface (if any, maybe you will not have any problems). Save the 5 corrected sentences. 3) Create two text files. The first file should be the 10 source language sentences you have been working with (so it should be 10 lines long, with one sentence per line). The second file should be the 10 correct (no wrong output!) sentences. IMPORTANT: You should separate punctuation from words (tokenization) in these text files! 4) Download the word alignment editor from here: http://www.cis.uni-muenchen.de/~fraser/nepal/align_browser_and_german_short.zip NOW READ THE README FILE! 5) Type this on a command line (without the quotes): "java TestAlign8". You should see an English sentence, a German sentence, and an alignment. Look at the gold standard sentences that are provided to get an idea of how a gold standard alignment works, there are 20 sentences that you can look at. Notice particularly complicated word alignments (i.e., those where you do not have a simple one-to-one link between two words). 6) Exit the program. As indicated in the README file, do the following steps: % rm *.out (this is the output file, DO NOT DELETE THIS IF YOU HAVE ANNOTATED ALIGNMENTS!!! See the README file, this is explained there) % cp /dev/null align (This just makes the align file empty. If you are not working under linux, just delete all the lines in the "align" file in an editor and save it) % cp your_GERMAN_file f (i.e., save the GERMAN file as the file "f", make sure it is NOT "f.txt") % cp your_ENGLISH_file e % java TestAlign8 Now you should see the first parallel sentence. Align these by using left mouse clicks. When you are done, click "next sentence". Annotate your 10 parallel sentences. After you exit, do: % cp align.out align (this saves your annotated alignments! DO NOT FORGET THIS STEP. However, do not do this step if you did not annotate. The README discusses this in detail) % rm *.out (this will allow you to run the tool again) IMPORTANT: please keep track of decisions that were difficult (for instance, English function words without clear translations on the target side). Also be ready to discuss any interesting alignment decisions you made (including difficult to align German words).