3.7.2. wf - WiTTFind¶
3.7.2.2. Build¶
$ cd wf
$ mkdir build
$ cd build
3.7.2.2.1. auf Matrix:¶
$ cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_CXX_COMPILER=g++-4.6 .. (obsolete)
3.7.2.2.2. clang¶
```$ cmake -DCMAKE_CXX_COMPILE=clang++ [-DCMAKE_CXXFLAGS=/usr/include/c++/4.9.2] ..```
3.7.2.2.3. Sonst:¶
$ cmake -DCMAKE_BUILD_TYPE=[debug|release] -DCMAKE_INSTALL_PREFIX=/install/path ..
$ make
3.7.2.3. Test¶
$ make && make test
3.7.2.4. Install¶
$ make && make test && make install
3.7.2.5. Makefile¶
There is a plain Makefile which automates the cmake building process:
$ cd wf
$ make
$ make test
$ make install
3.7.2.6. Usage¶
There are two simple tools for wittfind:
the wf tool searches for queries and gives out a list of matches.
the wf_display tool uses the input of wf to display the matches in the original file.
the wf_server tool starts a server listening on a unix domain socket.
the wf_client tool queries a server that waits on a unix domain socket.
all tools know the -h [ –help] option.
All tools are built in the build/bin folder (if you use the custom Makfile, they are copied directly into your current directory).
$ ./wf --help
$ ./wf -d dictionary -f input -q "query" -m max -o outfile
$ ./wf -d dictionary -f input -Q query-file -m max -o outfile
$ ./wf -L dicdictionary -f input -q "query"
max specifies the maximal number of hits shown. Default is 25, if 0 all matches are shown.
$ ./wf_display --help
$ ./wf_display [hits]
$ ./wf -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display
$ ./wf -v -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display -v
creating wf_display xml output for the web front end
(-t B marks hits with hit, -r hits puts the results in
starting the server: $ ./wf_server –dictionary data/witt_WAB_dela_X.txt –files files.txt
using the client:
$ wf_client –query ‘[VVFIN] & denken’ -f data/witt_Ts-214_input_IX_tagged.xml
-f data/witt_Ts-213_input_IX_tagged.xml –max 10 –threads 2
3.7.2.7. Query syntax¶
path/to/graph.json(arg1, arg2, …, argn) loads a subgraph from path/to/graph.json and replaces the arguments $1$, $2$, … $n$ with arg1, arg2, …, argn. It is not possible to concatenate subgraph expressions with other expressions in a node.
token matches any token that is either ‘token’ or has ‘token’ as its lemma.
“token” or ‘token’ matches any token that is equal to token.
<GC> matches any token with the grammtical code ‘GC’.
[TAG] matches any token with an annotation equal to ‘TAG’
/regex/ matches any token that matches the regular expression ‘regex’. Note: if you want the regex to match the whole token you have to use ‘/^regex$/’.
/regex/i matches any token that matches the regular expression ‘regex’ ignoring case.
[/regex/] matches all token whose tag matches the regular expression ‘regex’.
You can prefix any query expression with ‘!’ to prohibt the higlighting of this particular match. E.g. “/\w+/ !<PUNCT> /\w+/” matches words, followed by punctuation and another word, but the punctuation is not highlighted as a match (the two words are, though): [[[foo]]] , [[[bar]]]
You use boolean operators ‘(‘, ‘)’, ‘&’, ‘|’ or ‘~’ in a node to form complex expressions:
‘/en$/ & [N]’ matches token that end on ‘en’ and have the tag N.
‘/en$/ | [N]’ matches token that end on ‘en’ or have the tag N.
‘~ /en$/’ matches token that don’t end with ‘en’.
use bracets to form more complex expressions.
the parser for complex expressions is not finished yet. You need to use explicit whitespace to seperate operators and expressions: ‘(/a/|/b/)&~/c/’ is invalid. Use: ‘( /a/ | /b/ ) & ~ /c/’.
There are some unitex special expressions: <MAJ>, <MIN>, <MOT>, <PRE> and <NB> wich all map to speical regex pattern.
It is possible to use ‘?’ to make the previous expression optional. The query ‘a b? c’ matches the token ‘a b c’ or ‘a c’.
It is possible to specify a range expression ‘{f, t}’ to match the previous match f to t times. The query ‘x {3, 5}’ matches at least 3 xs up to 5 xs. The Expression ‘{x}’ is shorthand for ‘{x, x}’.
You can append ‘+’ to an expression to make it match one or more times. The Query a
+ b matches any expression ‘a’ followed by one or more words and a ‘b’. You can append ‘*’ to an expression to make it match zero or more times.