3.7.2. wf - WiTTFind

3.7.2.1. About WiTTFind

3.7.2.1.1. Requirements

  • cmake >= 2.6

  • gcc >= 4.8.1

  • Boost >= 1.53.0

3.7.2.2. Build

$ cd wf
$ mkdir build
$ cd build

3.7.2.2.1. auf Matrix:

$ cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_CXX_COMPILER=g++-4.6 .. (obsolete)

3.7.2.2.2. clang

```$ cmake -DCMAKE_CXX_COMPILE=clang++ [-DCMAKE_CXXFLAGS=/usr/include/c++/4.9.2] ..```

3.7.2.2.3. Sonst:

$ cmake -DCMAKE_BUILD_TYPE=[debug|release] -DCMAKE_INSTALL_PREFIX=/install/path ..
$ make

3.7.2.3. Test

$ make && make test

3.7.2.4. Install

$ make && make test && make install

3.7.2.5. Makefile

There is a plain Makefile which automates the cmake building process:

$ cd wf
$ make
$ make test
$ make install

3.7.2.6. Usage

There are two simple tools for wittfind:

  • the wf tool searches for queries and gives out a list of matches.

  • the wf_display tool uses the input of wf to display the matches in the original file.

  • the wf_server tool starts a server listening on a unix domain socket.

  • the wf_client tool queries a server that waits on a unix domain socket.

  • all tools know the -h [ –help] option.

All tools are built in the build/bin folder (if you use the custom Makfile, they are copied directly into your current directory).

$ ./wf --help
$ ./wf -d dictionary -f input -q "query" -m max -o outfile
$ ./wf -d dictionary -f input -Q query-file -m max -o outfile
$ ./wf -L dicdictionary -f input -q "query"

max specifies the maximal number of hits shown. Default is 25, if 0 all matches are shown.

$ ./wf_display --help
$ ./wf_display [hits]
$ ./wf -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display
$ ./wf -v -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display -v

creating wf_display xml output for the web front end (-t B marks hits with hit, -r hits puts the results in tags): $ ./wf_display -t B -r hits pfad/zur/serverdatei.xml

starting the server: $ ./wf_server –dictionary data/witt_WAB_dela_X.txt –files files.txt

using the client: $ wf_client –query ‘[VVFIN] & denken’ -f data/witt_Ts-214_input_IX_tagged.xml
-f data/witt_Ts-213_input_IX_tagged.xml –max 10 –threads 2

3.7.2.7. Query syntax

  • path/to/graph.json(arg1, arg2, …, argn) loads a subgraph from path/to/graph.json and replaces the arguments $1$, $2$, … $n$ with arg1, arg2, …, argn. It is not possible to concatenate subgraph expressions with other expressions in a node.

  • token matches any token that is either ‘token’ or has ‘token’ as its lemma.

  • “token” or ‘token’ matches any token that is equal to token.

  • <GC> matches any token with the grammtical code ‘GC’.

  • [TAG] matches any token with an annotation equal to ‘TAG’

  • /regex/ matches any token that matches the regular expression ‘regex’. Note: if you want the regex to match the whole token you have to use ‘/^regex$/’.

  • /regex/i matches any token that matches the regular expression ‘regex’ ignoring case.

  • [/regex/] matches all token whose tag matches the regular expression ‘regex’.

  • You can prefix any query expression with ‘!’ to prohibt the higlighting of this particular match. E.g. “/\w+/ !<PUNCT> /\w+/” matches words, followed by punctuation and another word, but the punctuation is not highlighted as a match (the two words are, though): [[[foo]]] , [[[bar]]]

  • You use boolean operators ‘(‘, ‘)’, ‘&’, ‘|’ or ‘~’ in a node to form complex expressions:

    • ‘/en$/ & [N]’ matches token that end on ‘en’ and have the tag N.

    • ‘/en$/ | [N]’ matches token that end on ‘en’ or have the tag N.

    • ‘~ /en$/’ matches token that don’t end with ‘en’.

    • use bracets to form more complex expressions.

    • the parser for complex expressions is not finished yet. You need to use explicit whitespace to seperate operators and expressions: ‘(/a/|/b/)&~/c/’ is invalid. Use: ‘( /a/ | /b/ ) & ~ /c/’.

  • There are some unitex special expressions: <MAJ>, <MIN>, <MOT>, <PRE> and <NB> wich all map to speical regex pattern.

  • It is possible to use ‘?’ to make the previous expression optional. The query ‘a b? c’ matches the token ‘a b c’ or ‘a c’.

  • It is possible to specify a range expression ‘{f, t}’ to match the previous match f to t times. The query ‘x {3, 5}’ matches at least 3 xs up to 5 xs. The Expression ‘{x}’ is shorthand for ‘{x, x}’.

  • You can append ‘+’ to an expression to make it match one or more times. The Query a + b matches any expression ‘a’ followed by one or more words and a ‘b’.

  • You can append ‘*’ to an expression to make it match zero or more times.