Ondřej Pražák

ORCID: 0000-0001-5445-7792
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Authorship Attribution and Profiling
  • Advanced Text Analysis Techniques
  • Speech Recognition and Synthesis
  • Speech and dialogue systems
  • Language and cultural evolution
  • Biomedical Text Mining and Ontologies
  • Education, Psychology, and Social Research
  • Semantic Web and Ontologies
  • Text Readability and Simplification
  • Text and Document Classification Technologies
  • Software Engineering Research
  • Sentiment Analysis and Opinion Mining
  • Linguistics, Language Diversity, and Identity
  • Web Data Mining and Analysis
  • Discourse Analysis and Cultural Communication
  • Advanced Database Systems and Queries
  • Advanced Proteomics Techniques and Applications
  • Expert finding and Q&A systems
  • Mental Health via Writing
  • Educational Technology and Assessment
  • Spam and Phishing Detection
  • Computational and Text Analysis Methods
  • Theology and Canon Law Studies

University of West Bohemia
2015-2023

Pilsen Tools (Czechia)
2023

In this paper, we describe our method for detection of lexical semantic change, i.e., word sense changes over time. We examine differences between specific words in two corpora, chosen from different time periods, English, German, Latin, and Swedish. Our was created the SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. ranked 1st Sub-task binary change detection, 4th 2: detection. present which is completely unsupervised language independent. It consists preparing a vector...

10.18653/v1/2020.semeval-1.30 article EN cc-by 2020-01-01

This paper describes the training process of first Czech monolingual language representation models based on BERT and ALBERT architectures.We pre-train our more than 340K sentences, which is 50 times multilingual that include data.We outperform 9 out 11 datasets.In addition, we establish new state-of-the-art results nine datasets.At end, discuss properties upon results.We publish all pretrained fine-tuned freely for research community.

10.26615/978-954-452-072-4_149 article EN 2021-01-01

Zdeněk Žabokrtský, Miloslav Konopik, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondrej Prazak, Jakub Sido, Daniel Zeman. Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution. 2023.

10.18653/v1/2023.crac-sharedtask.1 article EN cc-by 2023-01-01

We introduce a system focused on solving SemEval 2016 Task 2 ‐ Interpretable Semantic Textual Similarity. The explores machine learning and rule-based approaches to the task. focus experiment with wide variety of algorithms as well several types features. core our consists in exploiting distributional semantics compare similarity sentence chunks. won competition “Gold standard chunk scenario”. have not participated “System

10.18653/v1/s16-1124 article EN Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

This paper presents an overview of the shared task on multilingual coreference resolution associated with CRAC 2022 workshop. Shared participants were supposed to develop trainable systems capable identifying mentions and clustering them according identity coreference. The public edition CorefUD 1.0, which contains 13 datasets for 10 languages, was used as source training evaluation data. CoNLL score in previous coreference-oriented tasks main metric. There 8 prediction submitted by 5...

10.48550/arxiv.2209.07841 preprint EN other-oa arXiv (Cornell University) 2022-01-01

In this paper, we present coreference resolution experiments with a newly created multilingual corpus CorefUD (Nedoluzhko et al., 2021).We focus on the following languages: Czech, Russian, Polish, German, Spanish, and Catalan.In addition to monolingual experiments, combine training data in train two joined models -for Slavic languages for all together.We rely an end-to-end deep learning model that slightly adapted corpus.Our results show can profit from harmonized annotations, using helps...

10.26615/978-954-452-072-4_125 article EN 2021-01-01

In this paper, we introduce a cross-lingual Semantic Role Labeling (SRL) system with language independent features based upon Universal Dependencies.We propose two methods to convert SRL annotations from monolingual dependency trees into universal trees.Our is derived and supervised learning that utilizes maximum entropy classifier.We design experiments verify whether the Dependencies are suitable for SRL.The results very promising they open new interesting research paths future.

10.26615/978-954-452-049-6_077 article EN 2017-11-10

This paper describes our approach to the CRAC 2022 Shared Task on Multilingual Coreference Resolution. Our model is based a state-of-the-art end-to-end coreference resolution system. Apart from joined multilingual training, we improved results with mention head prediction. We also tried integrate dependency information into model. system ended up in $3^{rd}$ place. Moreover, reached best performance two datasets out of 13.

10.48550/arxiv.2209.12516 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Coreference resolution, the task of identifying expressions in text that refer to same entity, is a critical component various natural language processing (NLP) applications. This paper presents our end-to-end neural coreference resolution system, utilizing CorefUD 1.1 dataset, which spans 17 datasets across 12 languages. Our model based on system. We first establish strong baseline models, including monolingual and cross-lingual variations, then propose several extensions enhance...

10.48550/arxiv.2408.16893 preprint EN arXiv (Cornell University) 2024-08-29

The paper presents an overview of the third edition shared task on multilingual coreference resolution, held as part CRAC 2024 workshop. Similarly to previous two editions, participants were challenged develop systems capable identifying mentions and clustering them based identity coreference. This year's took another step towards real-world application by not providing with gold slots for zero anaphora, increasing task's complexity realism. In addition, was expanded include a more diverse...

10.48550/arxiv.2410.15949 preprint EN arXiv (Cornell University) 2024-10-21

This paper introduces a Czech dataset for semantic similarity and relatedness.The contains word pairs with hand annotated scores that indicate the relatedness of words.The 953 compiled from 9 different sources.It words their contexts taken real text corpora including extra examples when are ambiguous.The is by 5 independent annotators.The average Spearman correlation coefficient annotation agreement r = 0.81.We provide reference evaluation experiments several methods computing relatedness.

10.26615/978-954-452-049-6_053 article EN 2017-11-10

Abstract This paper describes a novel dataset consisting of sentences with two different semantic similarity annotations; and without surrounding context. The data originate from the journalistic domain in Czech language. final contains 138,556 human annotations divided into train test sets. In total, 485 journalism students participated creation process. To increase reliability set, we compute as an average 9 individual annotation scores. We evaluate quality by measuring inter...

10.21203/rs.3.rs-2130964/v1 preprint EN cc-by Research Square (Research Square) 2022-10-26

This paper describes the process of collecting, maintaining and exploiting an English dataset web discussions. The consists many discussions with hand-annotated posts in context a tree structure page. Each post username, date, text, citations used by its author. contains 79 different websites at least 500 pages from each. page HTML tags texts taken selected pages. In paper, we also describe algorithms trained on dataset. employ basic architectures (such as bag words SVM classifier LSTM...

10.13053/cys-23-3-3259 article EN Computación y Sistemas 2019-10-07
Coming Soon ...