In chapter 3 we described some tools that produce lists with word frequencies or lists of bigrams for a given text. Inspecting such lists gives you a good idea as to the overall profile of the text; it may point you to certain words or bigrams that look unusual and whose behaviour you want to study further.
To do that, you probably want to see those words in their original context. A word with its context is a concordance. Concordance programs let you specify a word, a set of words, a part-of-speech tag or possibly some other kind of keyword and then return a concordance for each occurrence of that keyword.
Suppose that a preliminary study of words in sherlock
has
suggested that the usage of the word ``remember'' is worth looking
into further--e.g. to check whether it is used transitively as well
as intransitively. A good starting point for such a study is a
concordance for ``remember''. The typical way of displaying such a
concordance is as follows:
and then afterwards they remembered us, and sent them to moth d about it. He laughed, I remember, and shrugged his shoulders important. Can you remember any other little things abo arate us, I was always to remember that I was pledged to him, eave the papers here, and remember the advice which I have giv l not believe me. You may remember the Old Persian saying, the
In other words, the keyword (in this case ``remember'') appears roughly in the middle of each line, and each line has some predetermined number of characters. This way of displaying a concordance is called a Keyword in Context index, or KWIC index.
A KWIC index is not the only way of displaying a concordance. You could display each sentence or paragraph the keyword occurs in, rather than some words to the left and right of the keyword. But in what follows we will concentrate on KWIC indexes. We will first describe code that allows you to build KWIC indexes. In chapter4.4, we describe a graphical interface which makes it easy to build certain types of concordances.