NFDI4DS | UHH-SEMS - Publication Details

Ondřej Pražák

ORCID: 0000-0001-5445-7792

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5064551686

Research Areas

Natural Language Processing Techniques
Topic Modeling
Authorship Attribution and Profiling
Advanced Text Analysis Techniques
Speech Recognition and Synthesis
Speech and dialogue systems
Language and cultural evolution
Biomedical Text Mining and Ontologies
Education, Psychology, and Social Research
Semantic Web and Ontologies
Text Readability and Simplification
Text and Document Classification Technologies
Software Engineering Research
Sentiment Analysis and Opinion Mining
Linguistics, Language Diversity, and Identity
Web Data Mining and Analysis
Discourse Analysis and Cultural Communication
Advanced Database Systems and Queries
Advanced Proteomics Techniques and Applications
Expert finding and Q&A systems
Mental Health via Writing
Educational Technology and Assessment
Spam and Phishing Detection
Computational and Text Analysis Methods
Theology and Canon Law Studies

University of West Bohemia
2015-2023

Pilsen Tools (Czechia)
2023

UWB at SemEval-2020 Task 1: Lexical Semantic Change Detection

OPENALEX - Publications

Ondřej Pražák Pavel Přibáň Stephen Taylor Jakub Sido

In this paper, we describe our method for detection of lexical semantic change, i.e., word sense changes over time. We examine differences between specific words in two corpora, chosen from different time periods, English, German, Latin, and Swedish. Our was created the SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. ranked 1st Sub-task binary change detection, 4th 2: detection. present which is completely unsupervised language independent. It consists preparing a vector...

10.18653/v1/2020.semeval-1.30 article EN cc-by 2020-01-01

Czert – Czech BERT-like Model for Language Representation

OPENALEX - Publications

Jakub Sido Ondřej Pražák Pavel Přibáň Jan Pašek Michal Seják and 1 more

This paper describes the training process of first Czech monolingual language representation models based on BERT and ALBERT architectures.We pre-train our more than 340K sentences, which is 50 times multilingual that include data.We outperform 9 out 11 datasets.In addition, we establish new state-of-the-art results nine datasets.At end, discuss properties upon results.We publish all pretrained fine-tuned freely for research community.

10.26615/978-954-452-072-4_149 article EN 2021-01-01

Findings of the Second Shared Task on Multilingual Coreference Resolution

OPENALEX - Publications

Zdeněk Žabokrtský Miloslav Konopík Anna Nedoluzhko Michal Novák Maciej Ogrodniczuk and 4 more

Zdeněk Žabokrtský, Miloslav Konopik, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondrej Prazak, Jakub Sido, Daniel Zeman. Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution. 2023.

10.18653/v1/2023.crac-sharedtask.1 article EN cc-by 2023-01-01

UWB at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity with Distributional Semantics for Chunks

OPENALEX - Publications

Miloslav Konopík Ondřej Pražák David Steinberger Tomáš Brychcín

We introduce a system focused on solving SemEval 2016 Task 2 ‐ Interpretable Semantic Textual Similarity. The explores machine learning and rule-based approaches to the task. focus experiment with wide variety of algorithms as well several types features. core our consists in exploiting distributional semantics compare similarity sentence chunks. won competition “Gold standard chunk scenario”. have not participated “System

10.18653/v1/s16-1124 article EN Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

Findings of the Shared Task on Multilingual Coreference Resolution

OPENALEX - Publications

Zdeněk Žabokrtský Miloslav Konopík Anna Nedoluzhko Michal Novák Maciej Ogrodniczuk and 5 more

This paper presents an overview of the shared task on multilingual coreference resolution associated with CRAC 2022 workshop. Shared participants were supposed to develop trainable systems capable identifying mentions and clustering them according identity coreference. The public edition CorefUD 1.0, which contains 13 datasets for 10 languages, was used as source training evaluation data. CoNLL score in previous coreference-oriented tasks main metric. There 8 prediction submitted by 5...

10.48550/arxiv.2209.07841 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Multilingual Coreference Resolution with Harmonized Annotations

OPENALEX - Publications

Ondřej Pražák Miloslav Konopík Jakub Sido

In this paper, we present coreference resolution experiments with a newly created multilingual corpus CorefUD (Nedoluzhko et al., 2021).We focus on the following languages: Czech, Russian, Polish, German, Spanish, and Catalan.In addition to monolingual experiments, combine training data in train two joined models -for Slavic languages for all together.We rely an end-to-end deep learning model that slightly adapted corpus.Our results show can profit from harmonized annotations, using helps...

10.26615/978-954-452-072-4_125 article EN 2021-01-01

Cross-Lingual SRL Based upon Universal Dependencies

OPENALEX - Publications

Ondřej Pražák Miloslav Konopík

In this paper, we introduce a cross-lingual Semantic Role Labeling (SRL) system with language independent features based upon Universal Dependencies.We propose two methods to convert SRL annotations from monolingual dependency trees into universal trees.Our is derived and supervised learning that utilizes maximum entropy classifier.We design experiments verify whether the Dependencies are suitable for SRL.The results very promising they open new interesting research paths future.

10.26615/978-954-452-049-6_077 article EN 2017-11-10

End-to-end Multilingual Coreference Resolution with Mention Head Prediction

OPENALEX - Publications

Ondřej Pražák Miloslav Konopík

This paper describes our approach to the CRAC 2022 Shared Task on Multilingual Coreference Resolution. Our model is based a state-of-the-art end-to-end coreference resolution system. Apart from joined multilingual training, we improved results with mention head prediction. We also tried integrate dependency information into model. system ended up in $3^{rd}$ place. Moreover, reached best performance two datasets out of 13.

10.48550/arxiv.2209.12516 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD

OPENALEX - Publications

Ondřej Pražák Miloslav Konopík

Coreference resolution, the task of identifying expressions in text that refer to same entity, is a critical component various natural language processing (NLP) applications. This paper presents our end-to-end neural coreference resolution system, utilizing CorefUD 1.1 dataset, which spans 17 datasets across 12 languages. Our model based on system. We first establish strong baseline models, including monolingual and cross-lingual variations, then propose several extensions enhance...

10.48550/arxiv.2408.16893 preprint EN arXiv (Cornell University) 2024-08-29

Findings of the Third Shared Task on Multilingual Coreference Resolution

OPENALEX - Publications

Michal Novák Barbora Dohnalová Miloslav Konopík Anna Nedoluzhko Martin Popel and 5 more

The paper presents an overview of the third edition shared task on multilingual coreference resolution, held as part CRAC 2024 workshop. Similarly to previous two editions, participants were challenged develop systems capable identifying mentions and clustering them based identity coreference. This year's took another step towards real-world application by not providing with gold slots for zero anaphora, increasing task's complexity realism. In addition, was expanded include a more diverse...

10.48550/arxiv.2410.15949 preprint EN arXiv (Cornell University) 2024-10-21

Findings of the Third Shared Task on Multilingual Coreference Resolution

OPENALEX - Publications

Michal Novák Barbora Dohnalová Miloslav Konopík Anna Nedoluzhko Martin Popel and 5 more

10.18653/v1/2024.crac-1.8 article EN 2024-01-01

End-to-end Multilingual Coreference Resolution with Headword Mention Representation

OPENALEX - Publications

Ondřej Pražák Miloslav Konopík

10.18653/v1/2024.crac-1.10 article EN 2024-01-01

Czech news dataset for semantic textual similarity

OPENALEX - Publications

Jakub Sido Michal Seják Ondřej Pražák Miloslav Konopík Václav Moravec

10.1007/s10579-024-09795-z article EN Language Resources and Evaluation 2024-12-07

Czech Dataset for Semantic Similarity and Relatedness

OPENALEX - Publications

Miloslav Konopík Ondřej Pražák David Steinberger

This paper introduces a Czech dataset for semantic similarity and relatedness.The contains word pairs with hand annotated scores that indicate the relatedness of words.The 953 compiled from 9 different sources.It words their contexts taken real text corpora including extra examples when are ambiguous.The is by 5 independent annotators.The average Spearman correlation coefficient annotation agreement r = 0.81.We provide reference evaluation experiments several methods computing relatedness.

10.26615/978-954-452-049-6_053 article EN 2017-11-10

Czech News Dataset for Semantic Textual Similarity

OPENALEX - Publications

Jakub Sido Michal Seják Ondřej Pražák Miloslav Konopík Václav Moravec

Abstract This paper describes a novel dataset consisting of sentences with two different semantic similarity annotations; and without surrounding context. The data originate from the journalistic domain in Czech language. final contains 138,556 human annotations divided into train test sets. In total, 485 journalism students participated creation process. To increase reliability set, we compute as an average 9 individual annotation scores. We evaluate quality by measuring inter...

10.21203/rs.3.rs-2130964/v1 preprint EN cc-by Research Square (Research Square) 2022-10-26

English Dataset for Automatic Forum Extraction

OPENALEX - Publications

Jakub Sido Miloslav Konopík Ondřej Pražák

This paper describes the process of collecting, maintaining and exploiting an English dataset web discussions. The consists many discussions with hand-annotated posts in context a tree structure page. Each post username, date, text, citations used by its author. contains 79 different websites at least 500 pages from each. page HTML tags texts taken selected pages. In paper, we also describe algorithms trained on dataset. employ basic architectures (such as bag words SVM classifier LSTM...

10.13053/cys-23-3-3259 article EN Computación y Sistemas 2019-10-07

Coming Soon ...