home about me projects data resources tools publications presentations teaching

Lukáš Kyjánek

Photo of Me

l.kyjanek (at) gmail.com

I exploit statistics and machine learning to deal with:

  • derivational morphology with a focus on semantics in derivations and morphological categories;
  • computer lexicography with a focus on resources of word-formation and morphology, but not exclusively;
  • dialectometry with a focus on mutual intelligibility of closely related languages.
I was a PhD candidate of computational linguistics at Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University between 2020 and 2023.

I was at the internship at Laboratoire de linguistique formelle, Université de Paris, under a supervision of Olivier Bonami from January to August 2022.

projects

projects

Formalisation of word-formation meanings in language resources

I formalise derivational meanings, label them in language resources of several languages and compare them across languages.


Dissertation Thesis
(unfinished)

 
A data-based approach to competition in word-formation

I lead a team researching competition in word-formation processes and means employed to express the meanings of diminutiveness and gender marking in seven European languages.
START Research Programme, CUNI
(principal investigator, 2021-2023)

Project webpage
Word-formation structure of Czech words

I am part of a team which focuses on linguistic research into the word-formation structure of Czech words using specialised language resources, tools and large corpora.
GAČR Grant, CUNI MFF UFAL
(team member, 2019-2021)

Project webpage

Derivational networks for multiple languages

With Jonáš Vidra and Tomáš Musil, we focus on developing derivational networks for multiple languages using machine learning and cross-linguistical transfer of knowledge.
GAUK Grant, CUNI
(team member, 2019-2021)

 

Semantic labels into DeriNet

I carried out first experiments with labelling of derivational meanings in Czech DeriNet using machine learning methods. As a result, the database contains five labels since its version 2.0.
Student's Faculty Grant, MFF CUNI (principal investigator, 2019)

GitHub repository

Foreign words recognition

I developed a rule-based tool for the recognition of borrowed/foreign/loan words in the Czech language. Used rules are based on the study of so far published linguistic papers.
Student's Faculty Grant, MFF CUNI (principal investigator, 2018)

GitHub repository


Harmonised collection
Universal Derivations

I create a collection of harmonised lexical networks capturing word-formation in a cross-linguistically consistent annotation scheme for many languages.
Master Thesis

 
Mutual Intelligibility between West-Slavic languages

With Jiří Haviger, we focus on measurement of mutual intelligibility between West-Slavic languages, especially using conditional entropy and methods from the field of dialectometry.
Bachelor Thesis

 

data resources

data resources







Documentation: DeriNet API

The online, interactive documentation of DeriNet API, which is also exploitable for working with Universal Derivations. I created it in three parts: a complete description of all functions, a description of modules utilisable in the API, and an interactive example that is easily editable for user's desires.

Jupyter notebook
List of language resources of derivational morphology

The list assembles the existing language resources that contain any pieces of annotations of derivational morphology and/or word-formation, in a broader sense. It is a continuation of the first list of such resources that I published as a technical report in 2018. The list is kept up to date.

Webpage with the list

tools

tools

Classification of Czech verbs into classes and groups

Rule-based tool for a classification of Czech verbs into 5 common conjugation classes. It also identifies conjugation groups within the particular classes.

GitHub repository
Phonetic transcription of Czech, Slovak, and Polish

Rule-based tool for a phonetic transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA).

GitHub repository
Recognition of foreign words in the Czech language

Rule-based tool for a recognition of foreign/borrowed/loan words in the Czech language.


GitHub repository


Tool for manual annotation

The web-browser visual interface manual annotations of derivational morphology in resources like from UDer. The tool can be used online or cloned from Github.

GitHub repository

publications

publications

presentations

presentations

teaching

teaching

Subject 2021-2022 2022-2023 2023-2024
Programming 1 [NPRG030] (in Czech) web web web