This Interface for manual annotation serves as a siple tool for annotating clusters of lexemes that are derivationally related, e.g., teacher is derived from to teach. The main purpos of this tool was to facilitate annotation of data from individual resources that have been harmonised into the Universal Derivations collection; the expectation was to load the original data of derivationally related lexemes and organise their clusters into rooted tree data structures. It was done by marking individual relations between lexemes as present/absent in the resulting rooted trees. The tool is, however, utilisable for more general annotation tasks. For instance, a given data can be reanotated using this interface, and the interface can serve as a simple visualisation tool (as it allows users to load data stored in .tsv format).
Examples of the input data in the .tsv and .json format can be downloaded from the GitHub.
Once the data are uploaded (using Upload JSON, individual clusters of connected lexemes are automatically displayed. The canvas can be shifted by clicking and holding the left mouse button on the white empty place, and it can be zoomed in/out by mouse wheel. Lexemes are represented as nodes, relations between them are represented as edges. Individual nodes can be moved by clicking the left mouse button on the individual node, holding the left mouse button and moving the node. Each edge can be annotated by clicking the left mouse button and then by selecting the desired action button at the bottom panel (see buttons below and their keyboard shortcuts). More than one edge/nodes can be selected by holding the Ctrl key and by clicking the desired edges.
TSV to JSON
- changes the given input .tsv format into the .json format, which is the main working format for this interface;
the expected .tsv format consists of three columns separated by tabulators: mark, target_lexeme, source_lexeme; where
the mark is + (the edge is present in the data) or - (an edge is abset in the data),
the target_lexeme is the base lexeme, and
the source_lexeme is the derived lexeme.
Upload JSON
- uploads input .json data.
Save JSON
- saves the annotated data from the interface into .json format. The positions of nodes in individual canvases are also stored in the .json format.
JSON to TSV
- changes the output .json format into the .tsv format that consists of three columns as described above (see TSV to JSON);
positions of nodes in individual canvases are not stored in this format.
« (left arrow)
- displayes the previous cluster of connected lexemes, if there is any previous cluster.
Restore edge (shift)
- marks the selected edge(s) as solid, i.e., the edge should be present in the cluster.
Remove edge (delete)
- marks the selected edge(s) as dotted, i.e., the edge should be absent int the cluster.
» (right arrow)
- displayes the following cluster of connected lexemes, if there is any following cluster.
Input_text_box
- allows skipping cluster according to their ids (numbers from 1 to the number of loaded clusters);
and it also allows users to find the cluster that contains a desired lexemes (user can write the desired lexeme into the text box);
the submit of id/lexeme in this input_text_box is done by Enter.
Restore ALL
- selects and marks all edges as solid, i.e., the edge should be present in the cluster;
this button is locked in default setting, but it can be unlocked by using the checkbox on the left of the button.
Remove ALL
- selects and marks all edges as dotted, i.e., the edge should be absent in the cluster;
this button is locked in default setting, but it can be unlocked by using the checkbox on the left of the button.
Print Lexemes (L)
- lists all lexemes from the displayed cluster;
the lexemes can be easily copied and googled, for example.
Is it tree? (T)
- checks whether the displayed cluster(s) with the solid edges is/are organised into rooted tree(s);
the annotator can easily know whether there is still any non-tree edge in the cluster(s).
This work was supported by the Grant No. GA19-14534S of the Czech Science Foundation, the Grant No. START/HUM/010 of Grant schemes at Charles University (reg. No. CZ.02.2.69/0.0/0.0/19_073/0016935), the Grant No. SVV260575 of the Charles University, and the LINDAT/CLARIAH CZ project (No. LM2015071, No. LM2018101) of the Ministry of Education, Youth and Sports of the Czech Republic.
Lukáš Kyjánek is the author of this Interface for manual annotation. This interface is licensed under the MIT License. The scripts of this interface are available on the author's GitHub.