Alex Fraser NSSNLP, Kathmandu University In this short assignment we will again look at Google Translate for the language you used for assignment 1. We will then take a quick look at the Indian Parallel Corpora for that same language. 1) Take the 5 sentences for which you got bad output from Google Translate. Translate them again, preferably from a different IP address (for instance, if you translated them in the lab, try translating them from KUIC). 1a) Do you get the same output as before? Were your corrections partially or fully adopted? 1b) Go back to the bad output you obtained previously. Now that you know about phrase-based SMT learned from word-aligned parallel corpora, can you provide a different analysis of the bad output? 2) Take the first 5 sentences of the *training* data for the language you are working with from the Indian Parallel Corpora. Try these sentences in Google Translate (in the Hindi/Tamil/Urdu to English direction, as you did in assignment 1). Compare the outputs with the English parallel sentences (recall that these sentences were created using Amazon Mechanical Turk). What observations can you make about these two different English translations? Is the Google Translate output correct? Can you analyze any issues with it which are obviously caused by phrase-based SMT?