Next: Keyword-in-Context index Up: Concordances and Collocations Previous: Concordances and Collocations

Concordances

In chapter 3 we described some tools that produce lists with word frequencies or lists of bigrams for a given text. Inspecting such lists gives you a good idea as to the overall profile of the text; it may point you to certain words or bigrams that look unusual and whose behaviour you want to study further.

To do that, you probably want to see those words in their original context. A word with its context is a concordance. Concordance programs let you specify a word, a set of words, a part-of-speech tag or possibly some other kind of keyword and then return a concordance for each occurrence of that keyword.

Suppose that a preliminary study of words in sherlock has suggested that the usage of the word ``remember'' is worth looking into further--e.g. to check whether it is used transitively as well as intransitively. A good starting point for such a study is a concordance for ``remember''. The typical way of displaying such a concordance is as follows:

 and then afterwards they remembered us, and sent them to moth
d about it. He laughed, I remember, and shrugged his shoulders
       important. Can you remember any other little things abo
arate us, I was always to remember that I was pledged to him,
eave the papers here, and remember the advice which I have giv
l not believe me. You may remember the Old Persian saying, the

In other words, the keyword (in this case ``remember'') appears roughly in the middle of each line, and each line has some predetermined number of characters. This way of displaying a concordance is called a Keyword in Context index, or KWIC index.

A KWIC index is not the only way of displaying a concordance. You could display each sentence or paragraph the keyword occurs in, rather than some words to the left and right of the keyword. But in what follows we will concentrate on KWIC indexes. We will first describe code that allows you to build KWIC indexes. In chapter4.4, we describe a graphical interface which makes it easy to build certain types of concordances.

Next: Keyword-in-Context index Up: Concordances and Collocations Previous: Concordances and Collocations

Chris Brew
8/7/1998