Machine Learning Exercise WSD and MT, WS 2014-2015 Alex Fraser Both part 1 and 2 use the Wapiti classifier. We will actually only look at part 1 in class, part 2 is FYI. Note that you should put the scripts in the scripts tar file into the sa-tagged subdirectory from CMU Seminars. To compile wapiti, cd to the directory and simply type "make". Then copy the wapiti binary (which is simply called "wapiti") into the sa-tagged subdirectory as well. PART 1: binary classification ============================= part 1 does simple binary classification, using a Maximum Entropy classifier, which is an example of a discriminatively-trained linear model as discussed in class. A typical hand-in of part 1 should be structured like this: 1) Run the train/test script 2) Do an error analysis of the errors that the classifier makes on the dev set. To do this, do: diff -b --context=2 dev.txt tmp_dev_check | more This shows you interesting examples of places where your Wapiti model is failing to get the right answer. The rows marked with "!" are the differences. 3) Modify the extractor script to try a different feature (and say in the writeup how this feature is motivated by an example you saw in the diff of the dev data). Make any necessary modifications to the pattern file (usually you will have to tell it to look at the next column in the training data file, otherwise it will ignore your new feature). 4) Run the train/test script again. Compare your Precision/Recall/F1 scores versus what you had before. State whether your feature helped or hurt performance. Optional: run the label_dev again and say whether or not the example you saw before now has the correct label. Part 2: sequence classification =============================== Part 2 follows the same basic idea, but using the sequence classification capability of Wapiti. This implements a linear-chain Markov Conditional Random Field (CRF), which is a generalization of a Maximum Entropy classifier to a sequence. The bottommost feature in the Wapiti pattern file we use (which is called b_offset_pattern.txt) tells Wapiti to use the previously predicted label as a feature. For part 2, repeat the same steps as above in part 1, but using the seq train/test script. For error analysis you just need to do: diff -b --context=2 seq_dev.txt tmp_check_dev | more