next up previous contents
Next: An annotated list of Up: Which annotation scheme? Previous: The semantics of annotations

External format of annotations

Here the availability of good computational tools makes a big difference. Until recently one had to trade the requirements of the reader against those of the corpus designer. The pressure was for the corpus designer to load more and more annotation into the document, whereas the reader needed a clean and uncluttered text where such annotation as there is didn't get in the way. With good visualization tools like xkwic and the ability to rapidly construct special purpose viewing and editing aids there is no longer any reason to avoid dense markup.

The Edinburgh preference is very strongly towards the internationally standardised SGML markup language. In part this is because we have already devoted effort to producing tools which read and write SGML, in part because so many corpora are now either originated in SGML or being converted to it.

There are three caveats about SGML. The first is that the standard is distressingly complex, the second that the American computational linguistics and information retrieval communities are ambivalent about it and the third that you really do need the computational tools to make it comfortably human-readable.


next up previous contents
Next: An annotated list of Up: Which annotation scheme? Previous: The semantics of annotations
Chris Brew
8/7/1998