Lukáš Kyjánek

I am a PhD student at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University.

My research interests are word-formation, especially derivation, computer lexicography with focus on word-formation language resources, semantics in derivation including meanings of morphological categories, and dialectometry, respectively mutual intelligibility of closely related (West-Slavic) languages. I am interested in quantitative methods and machine learning to process the mention areas of (computational) linguistics.



Universal Derivations

I develop and maintain the UDer collection of harmonised lexical networks capturing word-formation in a cross-linguistically consistent annotation scheme for many languages. The collection resulted from my Master thesis at UFAL.

Project webpage


I help to develop a lexical network of derivational relations between Czech words. I enlarge the number of relations, prepare data for manual annotations, enrich the data with new pieces of annotations, and focus on semantic labelling in DeriNet.

Project webpage
Structure of Czech Words

I am part of a team which focuses on linguistic research into the word-formation structure of Czech words using specialized language resources and tools and large language corpora. This project is led by M. Ševčíková, and it is supported by GAČR in 2019-2021.

Project webpage

Derivational Networks
for Multiple Languages

I am part of a team with Jonáš Vidra and Tomáš Musil which focuses on developing derivational networks for multiple languages using machine learning and cross-linguistical transfer. This project is led by Jonáš Vidra, and it is supported by GAUK in 2019-2021.

Semantic Labels into DeriNet

I carried out first experiments with semantic labelling of derivational relations in DeriNet using machine learning methods. As a result, lexemes assign 5 labels in DeriNet 2.0. Supported by Student's Faculty Grant at MFF CUNI in 2019.

GitHub repository

Foreign Words Recognition

I developed a rule-based tool for the recognition of borrowed/foreign/loan words in the Czech language. Used rules are based on the study of so far published linguistic papers. Supported by Student's Faculty Grant at MFF CUNI in 2018.

GitHub repository

Mutual Intelligibility
between West-Slavic Languages

With my colleague Jiří Haviger, we focus on a quantitative measurement of mutual intelligibility between West-Slavic languages, especially using conditional entropy. As a by-product, I created scripts for phonetic transcription of the mentioned languages.


Classification of Czech Verbs
into Classes and Groups

Rule-based tool for a classification of Czech verbs into 5 common conjugation classes. It also identifies conjugation groups within the particular classes.

GitHub repository
Phonetic Transcription
of Czech, Slovak, and Polish

Rule-based tool for a phonetic transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA).

GitHub repository
Recognition of Foreign Words
in the Czech Language

Rule-based tool for a recognition of foreign/borrowed/loan words in the Czech language.

GitHub repository