# wf - WiTTFind ## About WiTTFind ### Requirements * cmake >= 2.6 * gcc >= 4.8.1 * Boost >= 1.53.0 ## Build $ cd wf $ mkdir build $ cd build ### auf Matrix: $ cmake -DCMAKE_BUILD_TYPE=release -DCMAKE_CXX_COMPILER=g++-4.6 .. (obsolete) ### clang ```$ cmake -DCMAKE_CXX_COMPILE=clang++ [-DCMAKE_CXXFLAGS=/usr/include/c++/4.9.2] ..``` ### Sonst: $ cmake -DCMAKE_BUILD_TYPE=[debug|release] -DCMAKE_INSTALL_PREFIX=/install/path .. $ make ## Test $ make && make test ## Install $ make && make test && make install ## Makefile There is a plain Makefile which automates the cmake building process: $ cd wf $ make $ make test $ make install ## Usage There are two simple tools for wittfind: * the wf tool searches for queries and gives out a list of matches. * the wf_display tool uses the input of wf to display the matches in the original file. * the wf_server tool starts a server listening on a unix domain socket. * the wf_client tool queries a server that waits on a unix domain socket. * all tools know the -h [ --help] option. All tools are built in the build/bin folder (if you use the custom Makfile, they are copied directly into your current directory). $ ./wf --help $ ./wf -d dictionary -f input -q "query" -m max -o outfile $ ./wf -d dictionary -f input -Q query-file -m max -o outfile $ ./wf -L dicdictionary -f input -q "query" max specifies the maximal number of hits shown. Default is 25, if 0 all matches are shown. $ ./wf_display --help $ ./wf_display [hits] $ ./wf -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display $ ./wf -v -d data/witt_WAB_dela.txt -f data/witt_input_tagged.xml -q /\d+/ | ./wf_display -v creating wf_display xml output for the web front end (-t B marks hits with hit, -r hits puts the results in ... tags): $ ./wf_display -t B -r hits pfad/zur/serverdatei.xml starting the server: $ ./wf_server --dictionary data/witt_WAB_dela_X.txt --files files.txt using the client: $ wf_client --query '[VVFIN] & denken' -f data/witt_Ts-214_input_IX_tagged.xml \ -f data/witt_Ts-213_input_IX_tagged.xml --max 10 --threads 2 ## Query syntax * path/to/graph.json(arg1, arg2, ..., argn) loads a subgraph from path/to/graph.json and replaces the arguments $1$, $2$, ... $n$ with arg1, arg2, ..., argn. It is not possible to concatenate subgraph expressions with other expressions in a node. * token matches any token that is either 'token' or has 'token' as its lemma. * "token" or 'token' matches any token that is equal to token. * \ matches any token with the grammtical code 'GC'. * [TAG] matches any token with an annotation equal to 'TAG' * /regex/ matches any token that matches the regular expression 'regex'. Note: if you want the regex to match the whole token you have to use '/^regex$/'. * /regex/i matches any token that matches the regular expression 'regex' ignoring case. * [/regex/] matches all token whose tag matches the regular expression 'regex'. * You can prefix any query expression with '!' to prohibt the higlighting of this particular match. E.g. "/\w+/ !\ /\w+/" matches words, followed by punctuation and another word, but the punctuation is not highlighted as a match (the two words are, though): [[[foo]]] , [[[bar]]] * You use boolean operators '(', ')', '&', '|' or '~' in a node to form complex expressions: * '/en$/ & [N]' matches token that end on 'en' and have the tag N. * '/en$/ | [N]' matches token that end on 'en' or have the tag N. * '~ /en$/' matches token that _don't_ end with 'en'. * use bracets to form more complex expressions. * the parser for complex expressions is not finished yet. You need to use explicit whitespace to seperate operators and expressions: '(/a/|/b/)&~/c/' is invalid. Use: '( /a/ | /b/ ) & ~ /c/'. * There are some unitex special expressions: \, \, \, \ and \ wich all map to speical regex pattern. * It is possible to use '?' to make the previous expression optional. The query 'a b? c' matches the token 'a b c' or 'a c'. * It is possible to specify a range expression '{f, t}' to match the previous match f to t times. The query 'x {3, 5}' matches at least 3 xs up to 5 xs. The Expression '{x}' is shorthand for '{x, x}'. * You can append '+' to an expression to make it match one or more times. The Query a + b matches any expression 'a' followed by one or more words and a 'b'. * You can append '*' to an expression to make it match zero or more times.