Research Statement

My research focuses on the intersection of computational linguistics and formal morphology. By combining quantitative methods with theoretical insights, I aim to model the complex architecture of inflectional and derivational morphology and word formation, specifically on Czech.

This interdisciplinary approach allows me to investigate morphological predictability of forms, paradigms and/or meanings, contributing to both the development of robust NLP resources and a deeper understanding of linguistic theory.

Research Areas

Morphology

I work on morphology (both inflectional and derivational) with a focus on computational approaches to predictability in morphological paradigms and semantics in general.

Computer lexicography

I develope several language resources and tools focused on morphology and word formation, but not exclusively.

Computational Linguistics

I exploit natual language processing (NLP), large language models (LLM), statistics, and machine learning to deal with the (large) language data.

Current Projects

Computational approach to predictability in morphology: Evidence from Czech

Type: Dissertation Thesis (in progress)

I study morphological paradigms and their coherence and predictability in morphology on the basis of Czech language data.

Past Projects

A data-based approach to competition in word-formation

Type: START Research Programme, CUNI (principal investigator, 2021-2023)
Link: Project website

I researched competition in word-formation processes and means employed to express the meanings of diminutiveness and gender marking in seven European languages.

Word-formation structure of Czech words

Type: GAČR Grant, CUNI (team member, 2019-2021)
Link: Project website

I focused on linguistic research into the word-formation structure of Czech words. We used specialised language resources, tools and large corpora of Czech.

Derivational networks for multiple languages

Type: GAUK Grant, CUNI (team member, 2019-2021)
Link: Project website

With Jonáš Vidra and Tomáš Musil, we focused on developing derivational networks for multiple languages using machine learning and cross-linguistical transfer of knowledge.

Harmonised collection Universal Derivations

Type: Master Thesis (2020)

I created a collection of harmonised lexical networks capturing word-formation in a cross-linguistically consistent annotation scheme for many languages.

Semantic labels into DeriNet

Type: Student's Faculty Grant, CUNI (principal investigator, 2019)
Link: GitHub repository

I carried out first experiments with labelling of derivational meanings in Czech DeriNet using machine learning methods. As a result, the database contains five labels since its version 2.0.

Foreign words recognition

Type: Student's Faculty Grant, CUNI (principal investigator, 2018)
Link: GitHub repository

I developed a rule-based tool for the recognition of borrowed/foreign/loan words in the Czech language. Used rules are based on the study of so far published linguistic papers.

Mutual Intelligibility between West-Slavic languages

Type: Bachelor Thesis (2017)

With Jiří Haviger, we focused on measurement of mutual intelligibility between West-Slavic languages, especially using conditional entropy and methods from the field of dialectometry.