Resources | Lukáš Kyjánek

Data & Corpora

Package of word embeddings of Czech from a large corpus

Participated on versions: v1.0

A collection of word embedding models for Czech (Word2Vec) trained on the large SYN v9 corpus.

Universal Derivations (UDer)

Participated on versions: v0.5v1.0v1.1

Universal Derivations (UDer) is a collection of harmonized word-formation resources for multiple languages, following the design principles of Universal Dependencies.

Website Repository

Universal Segmentations (UniSegments)

Participated on versions: v1.0

UniSegments is a multilingual data resource for morphological segmentation. It provides harmonized segmentations for 32 languages, inspired by the Universal Dependencies and Universal Derivations scheme.

Website Repository

DeriNet

Participated on versions: v1.6v2.0v2.1v2.3

A large-scale lexical network modeling derivational relations in Czech. It captures core word-formation relations on a set of more than 1 million Czech lexemes.

Website Repository

DeriNet.Ru

Participated on versions: v1.0

A large lexical resource of Russian derivational morphology, created by combining machine-learning methods with harmonized annotation schemes.

Website Repository

Semantic annotation of noun/verb conversion in Czech

Participated on versions: v1.0

A dataset of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories.

Website Repository

Tools & Software

Inflectional Classifier

Classification of Czech nouns, adjectives and verbs into inflectional classes and groups based on their morphological properties.

Phonetic Transcription

Phonetic transcription of Czech, Slovak, and Polish. A tool for converting text into IPA transcription.

Foreign Word Recognition

Recognition of foreign words in the Czech language using machine learning approaches.

Annotation Tool for Morphology

Tool for manual annotation of morphological data, designed to support specific rooted-tree annotation schemas for derivational morphology.