Research Statement
My research focuses on the intersection of computational linguistics and formal morphology. By combining quantitative methods with theoretical insights, I aim to model the complex architecture of inflectional and derivational morphology and word formation, specifically on Czech.
This interdisciplinary approach allows me to investigate morphological predictability of forms, paradigms and/or meanings, contributing to both the development of robust NLP resources and a deeper understanding of linguistic theory.
Research Areas
Morphology
I work on morphology (both inflectional and derivational) with a focus on computational approaches to predictability in morphological paradigms and semantics in general.
Computer lexicography
I develope several language resources and tools focused on morphology and word formation, but not exclusively.
Computational Linguistics
I exploit natual language processing (NLP), large language models (LLM), statistics, and machine learning to deal with the (large) language data.
Current Projects
Computational approach to predictability in morphology: Evidence from Czech
Type: Dissertation Thesis (in progress)
I study morphological paradigms and their coherence and predictability in morphology on the basis of Czech language data.
Past Projects
A data-based approach to competition in word-formation
Type: START Research Programme, CUNI (principal investigator, 2021-2023)
Link: Project website
I researched competition in word-formation processes and means employed to express the meanings of diminutiveness and gender marking in seven European languages.
Word-formation structure of Czech words
Type: GAČR Grant, CUNI (team member, 2019-2021)
Link: Project website
I focused on linguistic research into the word-formation structure of Czech words. We used specialised language resources, tools and large corpora of Czech.
Derivational networks for multiple languages
Type: GAUK Grant, CUNI (team member, 2019-2021)
Link: Project website
With Jonáš Vidra and Tomáš Musil, we focused on developing derivational networks for multiple languages using machine learning and cross-linguistical transfer of knowledge.
Harmonised collection Universal Derivations
Type: Master Thesis (2020)
I created a collection of harmonised lexical networks capturing word-formation in a cross-linguistically consistent annotation scheme for many languages.
Semantic labels into DeriNet
Type: Student's Faculty Grant, CUNI (principal investigator, 2019)
Link: GitHub repository
I carried out first experiments with labelling of derivational meanings in Czech DeriNet using machine learning methods. As a result, the database contains five labels since its version 2.0.
Foreign words recognition
Type: Student's Faculty Grant, CUNI (principal investigator, 2018)
Link: GitHub repository
I developed a rule-based tool for the recognition of borrowed/foreign/loan words in the Czech language. Used rules are based on the study of so far published linguistic papers.
Mutual Intelligibility between West-Slavic languages
Type: Bachelor Thesis (2017)
With Jiří Haviger, we focused on measurement of mutual intelligibility between West-Slavic languages, especially using conditional entropy and methods from the field of dialectometry.