next up previous contents
Next: Which annotation scheme? Up: Generating your own corpus Previous: Human factors in annotation

Be conservative (small c)

Never throw anything away. Ensure that everybody has access to the original data which you collected, don't impose interpretations on future users of the corpus. The London-Lund corpus of Spoken English has beautiful British School annotations of the intonation which the speakers used, but because the tapes are not publicly available these markers are less useful than they might be. You can't readily extend the analysis, reliably compare the markers with another intonational theory or be confident of applying the same scheme to a new corpus.

All experience with the Map Task indicates that the more you listen the more you hear. Not only does you interpretation of the intonation shift with time, you also change your mind about what the words seem to be. The problem is most dramatic for speech, but actually arises in much the same form for text. For example, the Conan-Doyle text has spelling errors which look like the fault of OCR. It would be wrong, but convenient, to edit them away.



Chris Brew
8/7/1998