Lukáš Kyjánek

I am a PhD student at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University.

My research interests are:

  • word-formation with focus on derivational morphology and semantics in derivations, including meanings of morphological categories;
  • computer lexicography with focus on resources capturing and processing word-formation, but not exclusively;
  • dialectometry with focus on mutual intelligibility of closely related languages.
I prefer to exploit statistics and machine learning for dealing with the mentioned topics of (computational) linguistics.



Formalisation of word-formation meanings in language resources

I formalise derivational meanings, label them in language resources of several languages and compare competition in them within individual languages as well as across the languages.
Dissertation Thesis
(work in progress)

A data-based approach to competition in word-formation

I lead a team researching competition in word-formation processes and means employed to express the meanings of diminutiveness and femaleness in seven European languages.
START Research Programme, CUNI
(principal investigator, 2021-2023)

Project webpage
Word-formation structure of Czech words

I am part of a team which focuses on linguistic research into the word-formation structure of Czech words using specialized language resources and tools and large language corpora.
(team member, 2019-2021)

Project webpage

Universal Derivations

I develop the collection of harmonised lexical networks capturing word-formation in a cross-linguistically consistent annotation scheme for many languages.
Master Thesis; Data resource related to Grants in which I am involved

Project webpage


I help to develop a lexical, derivational network for Czech. It includes, for example, enlargement of relations captured, preparation of manual annotations, and semantic labelling.
Data resource related to Grants in which I am involved

Project webpage
Derivational networks for multiple languages

With Jonáš Vidra and Tomáš Musil, we focus on developing derivational networks for multiple languages using machine learning and cross-linguistical transfer of knowledge.
(team member, 2019-2021)

Semantic labels into DeriNet

I carried out first experiments with labelling of derivational meanings in DeriNet using machine learning methods. As a result, the data contains five labels in DeriNet 2.0.
Student's Faculty Grant, MFF CUNI (principal investigator, 2019)

GitHub repository

Foreign words recognition

I developed a rule-based tool for the recognition of borrowed/foreign/loan words in the Czech language. Used rules are based on the study of so far published linguistic papers.
Student's Faculty Grant, MFF CUNI (principal investigator, 2018)

GitHub repository
Mutual Intelligibility between West-Slavic languages

With Jiří Haviger, we focus on measurement of mutual intelligibility between West-Slavic languages, especially using conditional entropy and methods originated from dialectometry.
Bachelor Thesis; ongoing research without grant support



Classification of Czech verbs into classes and groups

Rule-based tool for a classification of Czech verbs into 5 common conjugation classes. It also identifies conjugation groups within the particular classes.

GitHub repository
Phonetic transcription of Czech, Slovak, and Polish

Rule-based tool for a phonetic transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA).

GitHub repository
Recognition of foreign words in the Czech language

Rule-based tool for a recognition of foreign/borrowed/loan words in the Czech language.

GitHub repository

Documentation: DeriNet API

The online, interactive documentation of DeriNet API, which is also exploitable for working with Universal Derivations. I created it in three parts: a complete description of all functions, a description of modules utilisable in the API, and an interactive example that is easily editable for user's desires.

Jupyter notebook

Tool for manual annotation

The web-browser tool for facilitating the process of manual annotations of word-formation relations in the data resources such as DeriNet. I developed the tool in my Master Thesis and utilised it during the harmonisation of the existing data resources. The tool can be used online or can be cloned from Github.

GitHub repository
List of language resources of derivational morphology

The list assembles the existing language resources that contain any pieces of annotations of derivational morphology and/or word-formation, in a broader sense. It is a continuation of the first list of such resources that I published as a technical report in 2018. The list is kept up to date.

Webpage with the list