Data & Corpora

Package of word embeddings of Czech from a large corpus

Participated on versions: v1.0

A collection of word embedding models for Czech (Word2Vec) trained on the large SYN v9 corpus.

Universal Derivations (UDer)

Participated on versions: v0.5v1.0v1.1

Universal Derivations (UDer) is a collection of harmonized word-formation resources for multiple languages, following the design principles of Universal Dependencies.

Universal Segmantations (UniSegments)

Participated on versions: v1.0

UniSegments is a multilingual data resource for morphological segmentation. It provides harmonized segmentations for 32 languages, inspired by the Universal Dependencies and Universal Derivations scheme.

DeriNet

Participated on versions: v1.6v2.0v2.1v2.3

A large-scale lexical network modeling derivational relations in Czech. It captures core word-formation relations on a set of more than 1 million Czech lexemes.

DeriNet.Ru

Participated on versions: v1.0

A large lexical resource of Russian derivational morphology, created by combining machine-learning methods with harmonized annotation schemes.


Tools & Software