next up previous contents
Next: Choices in corpus design/collection Up: Corpus design Previous: Corpus design

Introduction

The purpose of this chapter is to introduce the range of corpora currently available, and to describe the criteria which are important in designing a corpus. Despite the fact that few users will really need to design their own corpus, we will adopt the perspective of the corpus designer. This perspective is particularly effective in dramatising the choices and their consequences, and most of the same issues arise in the much more common task of selecting a corpus from the range available.

Before anything else is done it is essential to be as precise as possible about the purpose or purposes which the corpus is intended to serve. Frequently there will be a central concern which drives the design of the whole corpus, but it is not possible to predict the all the uses to which a corpus will be put.

That said, it is worth trying to design the corpus to maximize the later applicability, particularly if the extra effort involved in the future-proofing is not too great. If you already have video clips or audio recordings corresponding to a corpus of transcribed speech, then it is almost certainly worth including references to the primary data as part of the annotation of the transcription.

On the other hand, don't generalize for the sake of it. If your primary interest is the variation in patterns of lexis between Australian and New Zealand English, it probably isn't worth collecting analogous data from three other dialects on the off chance that someone will one day find them interesting. But if there are broadly comparable pre-existing corpora for the three other dialects then it is only sensible to see whether you can adapt their design to your purpose.


next up previous contents
Next: Choices in corpus design/collection Up: Corpus design Previous: Corpus design
Chris Brew
8/7/1998