Alex Fraser NSSNLP, Kathmandu University In this assignment we will first look at Google Translate's capabilities for your language (or another SAR language you can speak). We will then look at the difficulty of performing manual word alignment. Follow all of these steps: 1) Create 10 *short* sentences in your native language (or a SAR language you know) that Google Translate can translate to English. The first 5 sentences should be sentences that Google Translate can translate correctly. The second 5 sentences should be similar to the first sentences, but Google Translate should translate them incorrectly. Note that I want you to pick sentences where Google Translate *does* know all the words, but where the output English is wrong. Save the 10 source language sentences, and the 10 target language sentences into a file in a software you trust like Open Office or Microsoft Word. IMPORTANT: in what you turn in, I want you to try to analyze what went wrong in each of the 5 sentences that were incorrect. Please say if you have problems like picking the wrong word sense of a polysemous word, or word ordering problems, or other problems. Be as explicit as possible, and make clear what source language word you are talking about (with an English translation). 2) *In the Google Translate interface* (THIS IS IMPORTANT!), fix the English output for the 5 bad sentences (when you click on different parts of the English output, you will see that it offers suggestions of how to fix things; or you can just type over the bad translation). You can also shift click to drag words. Write down any problems you have with this correction interface (if any, maybe you will not have any problems). Save the 5 corrected sentences. 3) Create two text files. The first file should be the 10 source language sentences you have been working with (so it should be 10 lines long, with one sentence per line). The second file should be the 10 correct (no wrong output!) sentences. *NEW* AND IMPORTANT: You should separate punctuation from words (tokenization) in these text files! 4) From the web server, week03, day02, get the align browser and save it. You can also get it from here: http://www.ims.uni-stuttgart.de/~fraser/t/align.zip Unzip the file on a computer that has Java. NOW READ THE README FILE! 5) Type this on a command line (without the quotes): "java TestAlign8". You should see an English sentence, a German sentence, and an alignment. 6) Exit the program. As indicated in the README file, do the following steps: % rm *.out (this is the output file, DO NOT DELETE THIS IF YOU HAVE ANNOTATED ALIGNMENTS!!! See the README file, this is explained there) % cp /dev/null align (This just makes the align file empty. If you are not working under linux, just delete all the lines in the "align" file in an editor and save it) % cp your_SAR_file f (i.e., save the SAR file as the file "f", make sure it is NOT "f.txt") % cp your_ENGLISH_file e % java TestAlign8 Now you should see the first parallel sentence. Align these by using left mouse clicks. When you are done, click "next sentence". Annotate your 10 parallel sentences. After you exit, do: % cp align.out align (this saves your annotated alignments! DO NOT FORGET THIS STEP. However, do not do this step if you did not annotate. The README discusses this in detail) % rm *.out (this will allow you to run the tool again) IMPORTANT: for what you will turn in: keep track of decisions that were difficult (for instance, English function words without clear translations on the target side). Also discuss any interesting alignment decisions you made (including difficult to align SAR words). 7) Send me the files you create for all steps, in a .zip file (or .tar.gz file if you know how to create that). Your file should include all of these things: a) A document containing the good, bad and corrected sentences (please save this as a *Microsoft Word* document). This document should also contain an analysis of the mistakes made for the bad sentences (see above). b) The e, f, and align.out files for the annotation. c) A discussion of the difficult or interesting alignment decisions you made in doing the annotation (see above). Be sure that it is clear which sentene and which word you are talking about.