next up previous contents
Next: Choosing your own corpus Up: Choices in corpus design/collection Previous: Where to get the

Copyright and legal matters

This is a minefield. You need to be careful. Many different competing interests are at work. Not everybody is a commercially disinterested corpus linguist. In the prevailing economic climate you aren't either.

For a contribution to science you want a corpus which can be distributed freely, at least for research purposes, because you want your results to be replicable. If the copyright is not yours you must get the agreement of the copyright holders to free distribution. This was done in the BNC spontaneous speech corpus, which involved people with personal cassette recorders carrying around sheaves of copyright release forms for their interlocutors to sign. You may need to sign a license, and/or to require users of your corpus to do the same.

There can be legal problems if the texts provide confidential of personal information about companies, people or anything else. In psycholinguistics it's sometimes necessary to blank out identifiable proper names with white noise.

Lexicographers and other people with substantial commercial interests in using corpora for goals other than scientific publication will typically want corpora to be unavailable to their competitors. This too can be done with a license, and has in the case of the Alvey tools, which are built on LDOCE. The license is such that Longmans can make sure that their competitors can't use the data.


next up previous contents
Next: Choosing your own corpus Up: Choices in corpus design/collection Previous: Where to get the
Chris Brew
8/7/1998