My name is Lukáš Kyjánek, and I am a full-time Master student at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics of Charles University.
My fields of interest are word-formation, especially derivational morphology and dialectometry with focus on mutual intelligibility of closely related (mainly West-Slavic) languages.
This is the topic of my Master thesis at UFAL. I will review and compare existed language resources containing word-formation across languages. Then, I will design suitable attribute-value schemas, and I will harmonize data structures and formats of some of them.
I am a part of the team which focuses on linguistic research into the word-formation structure of Czech words using specialized language resources and tools and large language corpora. This project is led by Magda Ševčíková, and it is supported by GAČR.
In this new ongoing project, I carry out first experiments with
semantic labelling of derivational relations in DeriNet using machine
learning methods. Supported by Student's Faculty Grant at MFF CUNI.
I help to develop a lexical network of derivational relations between Czech words. I focus on expanding the number of relations using other existed language resources, prepare data for manual annotations and do first experiments with semantic labelling in DeriNet.
WiktiWF is my still ongoing project with an ambition to extract and provide data for word-formation of 25 languages (in the same format, structure and license) from open data of Wiktionary.org. So far, five datasets (Czech, English, German, French and Polish) were published.
I develope a rule-based tool for the recognition of borrowed foreign words in the Czech language. Used rules are based on the study of so far published linguistic papers. In this project I was supported by Student's Faculty Grant at MFF CUNI.
With my colleague Jiří Haviger, we focus on a quantitative measuring of mutual intelligibility on graphemic and phonetic layers between West-Slavic languages, especially using conditional entropy. Thanks to that, among other things, I created scripts for phonetic transcription of mentioned languages.