# wf - WiTTFind
## About WiTTFind

### Requirements
 * cmake >= 2.6
 * gcc >= 4.8.1
 * Boost >= 1.53.0

## Build
    $ cd wf
    $ mkdir build
    $ cd build

### auf Matrix:
    $ cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_CXX_COMPILER=g++-4.6 .. (obsolete)
### clang
    ```$ cmake -DCMAKE_CXX_COMPILE=clang++ [-DCMAKE_CXXFLAGS=/usr/include/c++/4.9.2] ..```

### Sonst:
    $ cmake -DCMAKE_BUILD_TYPE=[debug|release] -DCMAKE_INSTALL_PREFIX=/install/path ..
    $ make

## Test
    $ make && make test

## Install
    $ make && make test && make install

## Makefile
There is a plain Makefile which automates the cmake building process:

    $ cd wf
    $ make
    $ make test
    $ make install

## Usage

There are two simple tools for wittfind:

 * the wf tool searches for queries and gives out a list of matches.
 * the wf_display tool uses the input of wf to display the matches in the original file.
 * the wf_server tool starts a server listening on a unix domain socket.
 * the wf_client tool queries a server that waits on a unix domain socket.
 * all tools know the -h [ --help] option.

All tools are built in the build/bin folder
(if you use the custom Makfile, they are copied directly into your current directory).

    $ ./wf --help
    $ ./wf -d dictionary -f input -q "query" -m max -o outfile
    $ ./wf -d dictionary -f input -Q query-file -m max -o outfile
    $ ./wf -L dicdictionary -f input -q "query"
max specifies the maximal number of hits shown. Default is 25, if 0 all matches are shown.

    $ ./wf_display --help
    $ ./wf_display [hits]
    $ ./wf -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display
    $ ./wf -v -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display -v

creating wf_display xml output for the web front end
(-t B marks hits with <B>hit</B>, -r hits puts the results in <hits>...</hits> tags):
    $ ./wf_display -t B -r hits pfad/zur/serverdatei.xml

starting the server:
    $ ./wf_server --dictionary data/witt_WAB_dela_X.txt --files files.txt

using the client:
    $ wf_client --query '[VVFIN] & denken' -f data/witt_Ts-214_input_IX_tagged.xml \
    -f data/witt_Ts-213_input_IX_tagged.xml --max 10 --threads 2

## Query syntax
 * path/to/graph.json(arg1, arg2, ..., argn) loads a subgraph from path/to/graph.json
   and replaces the arguments $1$, $2$, ... $n$ with arg1, arg2, ..., argn.
   It is not possible to concatenate subgraph expressions with other expressions in a node.
 * token matches any token that is either 'token' or has 'token' as its lemma.
 * "token" or 'token' matches any token that is equal to token.
 * \<GC\> matches any token with the grammtical code 'GC'.
 * [TAG] matches any token with an annotation equal to 'TAG'
 * /regex/ matches any token that matches the regular expression 'regex'.
   Note: if you want the regex to match the whole token you have to use '/^regex$/'.
 * /regex/i matches any token that matches the regular expression 'regex' ignoring case.
 * [/regex/] matches all token whose tag matches the regular expression 'regex'.
 * You can prefix any query expression with '!' to prohibt the higlighting of this particular match.
   E.g. "/\w+/ !\<PUNCT\> /\w+/" matches words, followed by punctuation and another word,
   but the punctuation is not highlighted as a match (the two words are, though):
   [[[foo]]] , [[[bar]]]
 * You use boolean operators '(', ')', '&', '|' or '~' in a node to form complex expressions:
    * '/en$/ & [N]' matches token that end on 'en' and have the tag N.
    * '/en$/ | [N]' matches token that end on 'en' or have the tag N.
    * '~ /en$/' matches token that _don't_ end with 'en'.
    * use bracets to form more complex expressions.
    * the parser for complex expressions is not finished yet. You need to use explicit whitespace
      to seperate operators and expressions: '(/a/|/b/)&~/c/' is invalid.
      Use: '( /a/ | /b/ ) & ~ /c/'.
 * There are some unitex special expressions: \<MAJ\>, \<MIN\>, \<MOT\>, \<PRE\> and \<NB\>
   wich all map to speical regex pattern.
 * It is possible to use '?' to make the previous expression optional.
   The query 'a b? c' matches the token 'a b c' or 'a c'.
 * It is possible to specify a range expression '{f, t}' to match the previous match f to t times.
   The query 'x {3, 5}' matches at least 3 xs up to 5 xs.
   The Expression '{x}' is shorthand for '{x, x}'.
 * You can append '+' to an expression to make it match one or more times.
   The Query a <MOT>+ b matches any expression 'a' followed by one or more words and a 'b'.
 * You can append '*' to an expression to make it match zero or more times.