Maciej Ogrodniczuk

ORCID: 0000-0002-3467-9424
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Language and Culture
  • Topic Modeling
  • Semantic Web and Ontologies
  • Literature, Language, and Rhetoric Studies
  • Speech and dialogue systems
  • Mathematics, Computing, and Information Processing
  • linguistics and terminology studies
  • Linguistics, Language Diversity, and Identity
  • Lexicography and Language Studies
  • Text Readability and Simplification
  • Biomedical Text Mining and Ontologies
  • Library Science and Information Systems
  • European and International Law Studies
  • Advanced Text Analysis Techniques
  • Linguistic research and analysis
  • Digital Humanities and Scholarship
  • Service-Oriented Architecture and Web Services
  • Legal Language and Interpretation
  • Language, Metaphor, and Cognition
  • Image Processing and 3D Reconstruction
  • Authorship Attribution and Profiling
  • Digital Rights Management and Security
  • Speech Recognition and Synthesis
  • Algorithms and Data Compression

Polish Academy of Sciences
2014-2024

Institute of Computer Science
2014-2024

The Institute of the Polish Language of the Polish Academy of Sciences
2022

Czech Academy of Sciences, Institute of Computer Science
2014-2019

Université de Tours
2014

University of Warsaw
2004

This paper presents the ParlaMint corpora containing transcriptions of sessions 17 European national parliaments with half a billion words. The are uniformly encoded, contain rich meta-data about 11 thousand speakers, and linguistically annotated following Universal Dependencies formalism named entities. Samples conversion scripts available from project's GitHub repository, complete openly via CLARIN.SI repository for download, as well through NoSketch Engine KonText concordancers Parlameter...

10.1007/s10579-021-09574-0 article EN cc-by Language Resources and Evaluation 2022-02-02

The paper presents Korpusomat, a web application aimed at building annotated corpora for the purpose of corpus linguistic studies.Korpusomat combines existing tools, such as morphological analyser, tagger and search engine, provides an easy-to-use environment technically compatible with National Corpus Polish from almost any text, including texts in binary formats.In we present current state project, its features functionalities, well some future plans developments tasks.A usage example is...

10.12921/cmst.2018.0000005 article EN Computational Methods in Science and Technology 2018-03-31

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, barriers impacting business, cross-lingual cross-cultural communication are still omnipresent. Language Technologies (LTs) powerful means to break down these barriers. While last decade has seen various initiatives that created multitude approaches technologies tailored Europe's specific needs, there an immense level fragmentation. At same time, AI...

10.48550/arxiv.2003.13833 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Zdeněk Žabokrtský, Miloslav Konopik, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondrej Prazak, Jakub Sido, Daniel Zeman. Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution. 2023.

10.18653/v1/2023.crac-sharedtask.1 article EN cc-by 2023-01-01

Abstract The paper presents the results of ParlaMint II project, which comprise comparable corpora parliamentary debates 29 European countries and autonomous regions, covering at least period from 2015 to 2022, containing over 1 billion words. are uniformly encoded, contain rich metadata about their 24 thousand speakers, linguistically annotated up level Universal Dependencies syntax named entities. focuses on enhancement made since I project compilation corpora, including encoding...

10.1007/s10579-024-09798-w article EN cc-by Language Resources and Evaluation 2024-12-28

This paper presents an overview of the shared task on multilingual coreference resolution associated with CRAC 2022 workshop. Shared participants were supposed to develop trainable systems capable identifying mentions and clustering them according identity coreference. The public edition CorefUD 1.0, which contains 13 datasets for 10 languages, was used as source training evaluation data. CoNLL score in previous coreference-oriented tasks main metric. There 8 prediction submitted by 5...

10.48550/arxiv.2209.07841 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...