Data & Corpora
Package of word embeddings of Czech from a large corpus
A collection of word embedding models for Czech (Word2Vec) trained on the large SYN v9 corpus.
Universal Derivations (UDer)
Universal Derivations (UDer) is a collection of harmonized word-formation resources for multiple languages, following the design principles of Universal Dependencies.
Universal Segmantations (UniSegments)
UniSegments is a multilingual data resource for morphological segmentation. It provides harmonized segmentations for 32 languages, inspired by the Universal Dependencies and Universal Derivations scheme.
DeriNet
A large-scale lexical network modeling derivational relations in Czech. It captures core word-formation relations on a set of more than 1 million Czech lexemes.
DeriNet.Ru
A large lexical resource of Russian derivational morphology, created by combining machine-learning methods with harmonized annotation schemes.
Tools & Software
Verb Classification
Classification of Czech verbs into classes and groups based on their morphological properties.
Phonetic Transcription
Phonetic transcription of Czech, Slovak, and Polish. A tool for converting text into IPA transcription.
Foreign Word Recognition
Recognition of foreign words in the Czech language using machine learning approaches.
Annotation Tool
Tool for manual annotation of linguistic data, designed to support specific rooted-tree annotation schemas.