Daryna Dementieva

ORCID: 0000-0003-0929-4140
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Hate Speech and Cyberbullying Detection
  • Spam and Phishing Detection
  • Text Readability and Simplification
  • Misinformation and Its Impacts
  • Advanced Malware Detection Techniques
  • Authorship Attribution and Profiling
  • Artificial Intelligence in Healthcare and Education
  • Speech Recognition and Synthesis
  • Software Engineering Research
  • Information Systems and Technology Applications
  • Statistical and Computational Modeling
  • Advanced Text Analysis Techniques
  • Text and Document Classification Technologies
  • Advanced Data Processing Techniques
  • Explainable Artificial Intelligence (XAI)
  • Sentiment Analysis and Opinion Mining
  • Biomedical Text Mining and Ontologies
  • Big Data Technologies and Applications
  • Diverse Scientific Research in Ukraine
  • Data Quality and Management
  • Online Learning and Analytics
  • Foreign Language Teaching Methods
  • Media Influence and Politics

Technical University of Munich
2022-2024

Skolkovo Institute of Science and Technology
2020-2022

University of Mannheim
2022

Skolkovo Foundation
2020

Large language models represent a significant advancement in the field of AI. The underlying technology is key to further innovations and, despite critical views and even bans within communities regions, large are here stay. This position paper presents potential benefits challenges educational applications models, from student teacher perspectives. We briefly discuss current state their applications. then highlight how these can be used create content, improve engagement interaction,...

10.35542/osf.io/5er8f preprint EN 2023-01-30

Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, Alexander Panchenko. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.469 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

People worldwide use language in subtle and complex ways to express emotions. While emotion recognition -- an umbrella term for several NLP tasks significantly impacts different applications other fields, most work the area is focused on high-resource languages. Therefore, this has led major disparities research proposed solutions, especially low-resource languages that suffer from lack of high-quality datasets. In paper, we present BRIGHTER-- a collection multilabeled emotion-annotated...

10.48550/arxiv.2502.11926 preprint EN arXiv (Cornell University) 2025-02-17

David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.

10.18653/v1/2021.emnlp-main.629 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information machine-generated text be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written papers SCIgen, GPT-2, GPT-3, ChatGPT, Galactica, as well co-created by humans ChatGPT. We also experiment with several...

10.3390/info14100522 article EN cc-by Information 2023-09-26

Daryna Dementieva, Alexander Panchenko. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing: Student Research Workshop. 2021.

10.18653/v1/2021.acl-srw.32 article EN cc-by 2021-01-01

Nowadays, misleading information spreads over the internet at an incredible speed, which can lead to irreparable consequences. As a result, it is becoming more and essential combat fake news, especially in early stages of its origins. Over past years, lot work has been done this direction. However, all existed solutions have their limitations. One main limitations current approaches that majority models are focused only on one language do not use any multilingual information. In work, we...

10.1109/dsaa49011.2020.00111 article EN 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) 2020-10-01

This paper presents the best-performing approach alias "Adam Smith" for SemEval-2023 Task 4: "Identification of Human Values behind Arguments". The goal task was to create systems that automatically identify values within textual arguments. We train transformer-based models until they reach their loss minimum or f1-score maximum. Ensembling by selecting one global decision threshold maximizes leads system in competition. based on stacking with logistic regressions shows best performance an...

10.18653/v1/2023.semeval-1.74 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2023-01-01

The rapid spread of deceptive information on the internet can have severe and irreparable consequences. As a result, it is important to develop technology that detect fake news. Although significant progress has been made in this area, current methods are limited because they focus only one language do not incorporate multilingual information. In work, we propose Multiverse-a new feature based evidence be used for news detection improve existing approaches. Our hypothesis cross-lingual as...

10.3390/jimaging9040077 article EN cc-by Journal of Imaging 2023-03-27

Detoxification is a task of generating text in polite style while preserving meaning and fluency the original toxic text. Existing detoxification methods are monolingual i.e. designed to work one exact language. This investigates multilingual cross-lingual behavior large models this setting. Unlike previous works we aim make language able perform without direct fine-tuning given Experiments show that capable performing transfer. However, tested state-of-the-art not on currently inevitable...

10.18653/v1/2022.acl-srw.26 article EN cc-by 2022-01-01

Text detoxification is the task of rewriting a toxic text into neutral while preserving its original content. It has wide range applications, e.g. moderation output neural chatbots or suggesting less emotional version posts on social networks. This paper provides description RUSSE-2022 competition methods for Russian language. first which features (i) parallel training data and (ii) manual evaluation. We describe setup competition, solutions participating teams analyse their performance. In...

10.28995/2075-7182-2022-21-114-131 article EN Computational Linguistics and Intellectual Technologies 2022-06-18

Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Yutong Zhou, Georg Groh. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics: System Demonstrations. 2023.

10.18653/v1/2023.ijcnlp-demo.7 article EN cc-by 2023-01-01

We introduce the first study of automatic detoxification Russian texts to combat offensive language. This kind textual style transfer can be used for processing toxic content on social media or eliminating toxicity in automatically generated texts. While much work has been done English language this field, there are no works suggest two types models—an approach based BERT architecture that performs local corrections and a supervised pretrained GPT-2 model. compare these methods with several...

10.3390/mti5090054 article EN cc-by Multimodal Technologies and Interaction 2021-09-04

This paper presents a solution for the Span Identification (SI) task in "Detection of Propaganda Techniques News Articles" competition at SemEval-2020. The goal SI is to identify specific fragments each article which contain use least one propaganda technique. binary sequence tagging task. We tested several approaches finally selecting fine-tuned BERT model as our baseline model. Our main contribution an investigation unsupervised data augmentation techniques based on distributional...

10.18653/v1/2020.semeval-1.234 article EN cc-by 2020-01-01

We introduce the first study of automatic detoxification Russian texts to combat offensive language.Such a kind textual style transfer can be used, for instance, processing toxic content in social media.While much work has been done English language this field, it never solved yet.We test two types models -unsupervised approach based on BERT architecture that performs local corrections and supervised pretrained GPT-2 model -and compare them with several baselines.In addition, we describe...

10.28995/2075-7182-2021-20-179-190 article EN Kompʹûternaâ lingvistika i intellektualʹnye tehnologii 2021-06-19

Formality is one of the important characteristics text documents.The automatic detection formality level a potentially beneficial for various natural language processing tasks.Before, two large-scale datasets were introduced multiple languages featuring annotation-GYAFC and X-FORMAL.However, they primarily used training style transfer models.At same time, on its own may also be useful application.This work proposes first to our knowledge systematic study methods based statistical,...

10.26615/978-954-452-092-2_031 article EN 2023-01-01

Daryna Dementieva, Daniil Moskovskiy, David Dale, Alexander Panchenko. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.ijcnlp-main.70 article EN cc-by 2023-01-01

The task of toxicity detection is still a relevant task, especially in the context safe and fair LMs development. Nevertheless, labeled binary classification corpora are not available for all languages, which understandable given resource-intensive nature annotation process. Ukrainian, particular, among languages lacking such resources. To our knowledge, there has been no existing corpus Ukrainian. In this study, we aim to fill gap by investigating cross-lingual knowledge transfer techniques...

10.48550/arxiv.2404.17841 preprint EN arXiv (Cornell University) 2024-04-27

Despite regulations imposed by nations and social media platforms, such as recent EU targeting digital violence, abusive content persists a significant challenge. Existing approaches primarily rely on binary solutions, outright blocking or banning, yet fail to address the complex nature of speech. In this work, we propose more comprehensive approach called Demarcation scoring speech based four aspect -- (i) severity scale; (ii) presence target; (iii) context (iv) legal scale suggesting...

10.48550/arxiv.2406.19543 preprint EN arXiv (Cornell University) 2024-06-27
Coming Soon ...