XKWIC supports quite complicated queries through its Corpus Query Processor. We list here some examples of possible queries. There is a separate manual on CPQ which gives more detail about possible queries and about how the queries are processed.
"research"
[word = "research"]
Both queries search for all occurrences of the word ``research''.
[word = "research.*"]
Search for all words starting with ``research''.
[lemma = "research"]
Search for all words related to the lemma ``research''.
[pos = "JJ"]
Search for all occurrences of words tagged as adjectives (i.e. with
the part of speech tag ``JJ'').
[word="research" & pos="JJ|NN"]
Search for all occurrences of the word ``research'' tagged as an
adjective or a noun.
[lemma = "research" & pos != "V.*"]
Search for occurrences of the lemma
``research'' whose part of speech does not start with ``V''
(i.e. which are not tagged as VB--verb, base form;
VBD-verb, past tense; VBG-verb, gerund; etc).
[lemma = "research"] "a|the"
Search for the lemma ``research'' followed by the words ``a'' or ``the''.
[pos = "JJ" & word !="such"] [lemma="research"]
Find all adjectives other than ``such''
that precede the lemma ``research''.
[lemma="research"] []* "funding"
The lemma ``research'' followed sometime later by the word
``funding''.
There is no restriction on the naure or amount of material
intervening between ``research'' and ``funding''.
[lemma="research"] []* "funding" within s
As before, but the word ``funding'' should occur in the same sentence
as the word ``research''.
Search the BNC for uses of the word ``zap''. Does ``zap'' ever occur as
an adjective? Does it occur as a noun? Do you agree that all
the occurrences found are indeed nouns
?
Select the BNC (using the Question Mark button) and launch the
query [word="zap" & pos="JJ"]
. This
reveals that the word ``zap'' never occurs as
an adjective.
When you launch the query [word="zap" & pos="N.*"]
on the BNC, it returns examples like ``it will become illegal to
zap food with radiation''. This is clearly a verb rather than a noun,
suggesting that some of the part-of-speech tagging may have been
wrong.
Can you see what the difference will be between the following
searches:
(i) [word = "research.*"]
,
(ii) [lemma = "research"]
, and
(iii) [word = "research.*"]
?
Which ones return the same result, and is this by accident or by
necessity?
The first query will find all words that start with ``research'',
including ``research-led'' or ``research-intensive''. Search (ii)
will return words morphologically or inflectionally derived from
``research'', which will exclude compounds like
``research-intensive''.
(iii) will return words morphologically derived from all words that
start with ``research''. Since all these morphological derivations
also happen to start with ``research'',
the result of (i) and (iii) will be the same.