2.1.5. Deployment Editionsdaten für das WiTTFind Projekt¶
The Edition data from our Cooperation-Partners must be transferred and prepared for the use of our FinderApps.
Different [CW]AST -tools are provided to do this job and are ruled and performed with the help of makefiles
Some tools need the output from other tools, so it is important to follow this sequence
of our tools. We call this sequence our [CW]AST Toolchain. The results from these tools are
stored in an Directories, specified in the Makefile.
Overview of all Makefile targets are described in this README.md
The automatic [CW]toolchain is performed from the command make deploy
2.1.5.1. The automatic [CW]AST toolchain¶
STEP 1: expand all choices from the Edition data and test resuls
STEP 2: perform tagging of the Edition data (expanded and non expanded data can be tagged)
STEP 3: produce Frequency-Lists from (expanded/nonexpanded) Edition data in Json format
STEP 4: produce Sentence-Lists from all tagged and (expanded/nonexpanded) Edition data
STEP 5: produce Dokument-id’s
STEP 6: Export to the central
export-datafolder
The Makefile-target make export-data copies the following files into the export-data
folder:
2.1.5.2. The [CW]AST toolchain Makefile target¶
The make deploy target executes the [CW]AST Toolchain automatically
Synchronously:
make download-tree-taggermake tagged
Parallel:
make expanded-norm-taggedmake lemma_freqlistmake sentence-listsmake semantic-freqlist

2.1.5.3. STEP 1: Choices¶
Most Editions annotate writing-variations of the authors in their Edition, so the text can be read in
different variations. To annotate a writing-variation, the Editors usually take the XML-
choice tag.
The main Makefile for expanding choices can be found in make/choices.make.
The following targets can be accessed:
2.1.5.3.1. Expand all diplo/norm choices¶
For expand all diplo/norm choices, use:
make expanded-norm-choices
make expand-diplo-choices
Results: Each expanded file is stored in the same Edition-Directory of the edition file.
Warning 1: Expanding is a very time-consuming CPU task, and takes very long
Warning 2: The current expand choices implementation validates all input xml files. To turn this off, use:
make EXPAND_CHOICES_OPT="--novalidation" expand-norm-choices
make EXPAND_CHOICES_OPT="--novalidation" expand-diplo-choices
2.1.5.3.2. Execute unittests¶
To execute all test cases for the expand choices script, use:
make make expand-choices-test
2.1.5.3.3. Variables¶
The following Makefile variables are available and can be set:
| variable | description | default value |
|---|---|---|
EXPAND_CHOICES_DIR |
Path where the expand choices script is located | $(CISWAB_TOOLS_DIR)/choices/src/main/python |
EXPAND_CHOICES_TEST_DIR |
Path where all test cases are located | $(CISWAB_TOOLS_DIR)/choices/src/unittest/python |
EXPAND_CHOICES_CMD |
Script name | expand_choices.py |
EXPAND_CHOICES_STARTER |
(Python) interpreter for executing the script | $(PYTHON3_RUNNER) |
EXPAND_CHOICES_OPT |
Options which are passed to $(EXPAND_CHOICES_CMD). Here you can e.g. turn off xml validation with using --novalidation |
empty |
The EXPAND_CHOICES_RUNNER variable puts all variables together.
2.1.5.4. STEP 2: Tagging all expanded Data¶
The first computational analysis of the expanded data is done from Dr. H. Schmid’s treetagger. We have adopted and optimized the tree-tagger for our purpose and use this special variant.
2.1.5.4.1. Download (the original Version)¶
The TreeTagger needs to be downloaded first, this can be done with the following Makefile target:
make download-tree-tagger
The TreeTagger licences must be accepted and read before downloading.
2.1.5.4.2. Tagging (norm) files (with our optimized version)¶
To tag all (norm) files, the following target can be used:
make tagged
2.1.5.4.3. Tagging (expanded) diplo/norm files (with our optimized version)¶
To tag all expanded diplo/norm files, the following target can be used:
make expanded-diplo-tagged
make expanded-norm-tagged
Results: Each tagged file is stored in the same Edition-Directory of the untagged file.
2.1.5.4.4. Variables¶
The following Makefile variables are available and can be set:
| variable | description | default value |
|---|---|---|
TREETAGGER_DIR |
Defines the TreeTagger directory | $(CISWAB_TOOLS_DIR)/tree-tagger |
2.1.5.5. STEP 3: Frequency and Semantic Frequency Lists (Format: .txt Files, utf-8, .json)¶
2.1.5.5.1. STEP 3 a) Many tools of the FinderApp use precalculated frequencylists for display or for processing.¶
The target
make lemma_freqlistproduces frequency token lists and lemma lists.The target
make semantic-freqlistproduces Semantic frequency Lists with the help of the aktual Lexikon
Integrated targets for the different frequency lists are:
make freqlist: creates a frequency lists of all tagged input files and saves the result to$(EXPORT_DIR)/lexikon/frequencies.txtmake lemmalist: Reads the created frequency list from themake freqlisttarget and creates a lemma list. The result will be saved to$(EXPORT_DIR)/lexikon/frequencies.lemmaall-freqlist: creates a frequency lists of all tagged input files and saves the result to$(CISWAB_TOOLS_DIR)/frequency/all_frequencies.txtand to$(CISWAB_TOOLS_DIR)/frequency/all_frequencies_picklemusic-freqlist-lemmatized: creates a lemmatized frequency list for music with help of$(DICT_WITT)and regex and saves the result to$(EXPORT_DIR)/lexikon/music/lemmatized_music_frequencies.txtcolor-freqlist-lemmatized: creates a lemmatized frequency list for colors with help of$(DICT_WITT)and regex and saves the result to$(EXPORT_DIR)/lexikon/color/lemmatized_color_frequencies.txt
2.1.5.5.2. STEP 3 b): Implicit Export of json files into export folder¶
The semantic frequency lists can be converted to json via the
freq_by_category_to_json.py script (former: convertFrequencyLists2JSON.perl)
make music-by-categorytarget produces and copies all semantic frequency lists in json format for music into the$(BASIC_EXPORT_DIC)/musikfolder.make color-by-categorytarget produces and copies all semantic frequency lists in json format for color into the$(BASIC_EXPORT_DIC)folder.others-by-category: target produces and copies all semantic frequency lists in json format for all other categories into the$(BASIC_EXPORT_DIC)folder.
2.1.5.5.3. Former Steps 6 and 7:¶
Former STEP 6: Convert semantic frequency lists to json
With the following target:
`make convert-freqlist-json`
Former step 7:
The
make export-converted-freqlist-jsontarget copied all semantic frequency lists (converted to json) into theexport-data/lexiconfolder is not needed anymore since the new frequencies are created directly in json format and exported to the correct folders.
2.1.5.5.4. Variables¶
The following Makefile variables are available and can be set:
| variable | description | Results: default value |
|---|---|---|
LEMMATOOLS_DIR |
Defines the frequency tools directory | $(CISWAB_TOOLS_DIR)/frequency |
LEMMALIST |
Defines the path to the lemma list | $(EXPORT_DIR)/lexikon/frequencies.lemma |
FREQ_TOOLS_DIR |
Defines the tools directory for freqlists | $(CISWAB_TOOLS_DIR)/frequency |
FREQLIST_DIR |
Defines the freqlist directory | $(WITT_DATA_HOME_DIR)/lexikon/freqlisten |
FREQLIST |
Defines the path to the frequency list | $(EXPORT_DIR)/lexikon/frequencies.txt |
ALL_FREQLIST |
Defines the path to the frequency list | $(CISWAB_TOOLS_DIR)/frequency/all_frequencies.txt |
ALL_FREQ_PICKLE |
Defines the path to the pickled frequency list | $(CISWAB_TOOLS_DIR)/frequency/all_frequencies_pickle |
BASIC_EXPORT_DIC |
Defines the path to the export lexicon folder | $(EXPORT_DIR)/lexikon |
To get a better overview of all set variables, the target
make info_semantic_freqlist
(former: make info-convert-freqlist-json)
can be run.
2.1.5.6. STEP 4: Sentence lists¶
Many tools of the FinderApp use Sentence separated Text or Editions files for processing.
The target make sentence-list creates sentence lists for all tagged edition files.
make sentence-list
This target will create corresponding -tagged-index.json and -tagged.html
for all tagged input files.
Results: The generated files are placed in the same Edition-Directory as the tagged input file.
2.1.5.6.1. Variables¶
The following Makefile variables are available and can be set:
| variable | description | default value |
|---|---|---|
SENTENCE_TOOLS_DIR |
Defines the sentence tools directory | $(CISWAB_TOOLS_DIR)/sentence |
2.1.5.7. STEP 5: Document ids¶
To create a file containing all document ids, the following target can be used:
make document-ids
This target writes the document ids file to $(EXPORT_DIR)/ciswab/documentIds.txt.
2.1.5.8. STEP 6: Export¶
The Makefile-target make export-data copies the following files into the export-data
folder:
all tagged input files
sentence lists:
-tagged-index.jsonand-tagged.htmlfiles
2.1.6. Transfer/Update Data from our Cooperationpartners into our repository¶
For privileged Users it is possible to use the target update_edition_datato get the latest Edition-data from our cooperation partners and transfer them into their data repository.