home about me projects data resources tools publications presentations teaching

Lukáš Kyjánek

Photo of Me

l.kyjanek (at) gmail.com

I exploit statistics and machine learning to deal with:

  • morphology with a focus on computational approaches to predictability in morphological paradigms;
  • derivational morphology with a focus on semantics in derivations and morphological categories;
  • computer lexicography with a focus on resources of word-formation and morphology, but not exclusively;
  • dialectometry with a focus on mutual intelligibility of closely related languages.
I am a PhD candidate of linguistics at Laboratoire de Linguistique Formelle, Université Paris Cité from 2025.

I was at the internship at Laboratoire de Linguistique Formelle, Université Paris Cité, under a supervision of Olivier Bonami from January to August 2022.

I was a PhD candidate of computational linguistics at Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University between 2020 and 2023.

projects

projects

A data-based approach to competition in word-formation

I researched competition in word-formation processes and means employed to express the meanings of diminutiveness and gender marking in seven European languages.
START Research Programme, CUNI
(principal investigator, 2021-2023)

Project webpage
Word-formation structure of Czech words

I focused on linguistic research into the word-formation structure of Czech words. We used specialised language resources, tools and large corpora of Czech.
GAČR Grant, CUNI
(team member, 2019-2021)

Project webpage
Derivational networks for multiple languages

With Jonáš Vidra and Tomáš Musil, we focused on developing derivational networks for multiple languages using machine learning and cross-linguistical transfer of knowledge.
GAUK Grant, CUNI
(team member, 2019-2021)

Project webpage


Semantic labels into DeriNet

I carried out first experiments with labelling of derivational meanings in Czech DeriNet using machine learning methods. As a result, the database contains five labels since its version 2.0.
Student's Faculty Grant, CUNI
(principal investigator, 2019)

GitHub repository

Foreign words recognition

I developed a rule-based tool for the recognition of borrowed/foreign/loan words in the Czech language. Used rules are based on the study of so far published linguistic papers.
Student's Faculty Grant, CUNI
(principal investigator, 2018)

GitHub repository


Computational approach to predictability in morphology

I study morphological paradigms and their coherence and predictability in morphology on the basis of Czech language data.


Dissertation Thesis (work-in-progress)

Université Paris Cité
Harmonised collection
Universal Derivations

I created a collection of harmonised lexical networks capturing word-formation in a cross-linguistically consistent annotation scheme for many languages.
Master Thesis

Charles University
Mutual Intelligibility between West-Slavic languages

With Jiří Haviger, we focused on measurement of mutual intelligibility between West-Slavic languages, especially using conditional entropy and methods from the field of dialectometry.
Bachelor Thesis

University of Hradec Králové

data resources

data resources







Documentation: DeriNet API

The online, interactive documentation of DeriNet API, which is also exploitable for working with Universal Derivations. I created it in three parts: a complete description of all functions, a description of modules utilisable in the API, and an interactive example that is easily editable for user's desires.

Jupyter notebook
List of language resources of derivational morphology

The list assembles the existing language resources that contain any pieces of annotations of derivational morphology and/or word-formation, in a broader sense. It is a continuation of the first list of such resources that I published as a technical report in 2018. The list is kept up to date.

Webpage with the list

tools

tools

Classification of Czech verbs into classes and groups

Rule-based tool for a classification of Czech verbs into 5 common conjugation classes. It also identifies conjugation groups within the particular classes.

GitHub repository
Phonetic transcription of Czech, Slovak, and Polish

Rule-based tool for a phonetic transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA).

GitHub repository
Recognition of foreign words in the Czech language

Rule-based tool for a recognition of foreign/borrowed/loan words in the Czech language.


GitHub repository


Tool for manual annotation

The web-browser visual interface manual annotations of derivational morphology in resources like from UDer. The tool can be used online or cloned from Github.

GitHub repository

publications

publications

presentations

presentations

teaching

teaching

Subject 2021-2022 2022-2023 2023-2024
Programming 1 [NPRG030] (in Czech) web web web