Lukáš. Kyjánek


My name is Lukáš Kyjánek, and I am a full-time Master student at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics of Charles University.

My fields of interest are word-formation, especially derivational morphology and dialectometry with focus on mutual intelligibility of closely related (mainly West-Slavic) languages.



Harmonization of Language Resources for Word-Formation of Multiple Languages

This is the topic of my Master thesis at UFAL. I will review and compare existed language resources containing word-formation across languages. Then, I will design suitable attribute-value schemas, and I will harmonize data structures and formats of some of them.

Word-Formation Structure of Czech Words

I am a part of the team which focuses on linguistic research into the word-formation structure of Czech words using specialized language resources and tools and large language corpora. This project is led by Magda Ševčíková, and it is supported by GAČR.

Project description
Semantic Labelling of Word-Formation Relations

In this new ongoing project, I carry out first experiments with semantic labelling of derivational relations in DeriNet using machine learning methods. Supported by Student's Faculty Grant at MFF CUNI.

GitHub repository

I help to develop a lexical network of derivational relations between Czech words. I focus on expanding the number of relations using other existed language resources, prepare data for manual annotations and do first experiments with semantic labelling in DeriNet.

Project webpage

WiktiWF is my still ongoing project with an ambition to extract and provide data for word-formation of 25 languages (in the same format, structure and license) from open data of Wiktionary.org. So far, five datasets (Czech, English, German, French and Polish) were published.

GitHub repository
Foreign Words Recognition

I develope a rule-based tool for the recognition of borrowed foreign words in the Czech language. Used rules are based on the study of so far published linguistic papers. In this project I was supported by Student's Faculty Grant at MFF CUNI.

GitHub repository
Mutual Intelligibility of West-Slavic Languages

With my colleague Jiří Haviger, we focus on a quantitative measuring of mutual intelligibility on graphemic and phonetic layers between West-Slavic languages, especially using conditional entropy. Thanks to that, among other things, I created scripts for phonetic transcription (will be published soon) of mentioned languages.


I am an author or co-author of these datasets:

I am an author or co-author of these tools: