NFDI4DS | UHH-SEMS - Publication Details

Dimitra Gkatzia

ORCID: 0000-0001-8568-7806

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5077450332

Research Areas

Topic Modeling
Natural Language Processing Techniques
Speech and dialogue systems
Semantic Web and Ontologies
Multimodal Machine Learning Applications
Explainable Artificial Intelligence (XAI)
Privacy-Preserving Technologies in Data
Advanced Text Analysis Techniques
Decision-Making and Behavioral Economics
Intelligent Tutoring Systems and Adaptive Learning
Hate Speech and Cyberbullying Detection
Human-Automation Interaction and Safety
Geographic Information Systems Studies
Data Visualization and Analytics
Video Analysis and Summarization
Adversarial Robustness in Machine Learning
Web Data Mining and Analysis
Artificial Intelligence in Games
Scientific Computing and Data Management
Advanced Image and Video Retrieval Techniques
Authorship Attribution and Profiling
Spam and Phishing Detection
Text and Document Classification Technologies
Persona Design and Applications
Machine Learning and Data Classification

Edinburgh Napier University
2016-2024

University of Edinburgh
2020

University of Coimbra
2017

Universitat Pompeu Fabra
2017

Thomson Reuters (United States)
2017

Bridge University
2017

University of Cambridge
2017

Heriot-Watt University
2015

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

OPENALEX - Publications

David M. Howcroft Anja Belz Miruna Clinciu Dimitra Gkatzia Sadid A. Hasan and 5 more

David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser. Proceedings of the 13th International Conference on Natural Language Generation. 2020.

10.18653/v1/2020.inlg-1.23 article EN cc-by 2020-01-01

Natural Language Generation enhances human decision-making with uncertain information

OPENALEX - Publications

Dimitra Gkatzia Oliver Lemon Verena Rieser

Decision-making is often dependent on uncertain data, e.g.data associated with confidence scores or probabilities.We present a comparison of different information presentations for data and, the first time, measure their effects human decision-making.We show that use Natural Language Generation (NLG) improves decision-making under uncertainty, compared to state-of-theart graphical-based representation methods.In task-based study 442 adults, we found using NLG lead 24% better average than...

10.18653/v1/p16-2043 article EN 2016-01-01

From documents to dialogue: Context matters in common sense-enhanced task-based dialogue grounded in documents

OPENALEX - Publications

Carl Strathearn Yanchao Yu Dimitra Gkatzia

10.1016/j.eswa.2025.127304 article EN cc-by Expert Systems with Applications 2025-04-01

A Snapshot of NLG Evaluation Practices 2005 - 2014

OPENALEX - Publications

Dimitra Gkatzia Saad Mahamood

In this paper we present a snapshot of endto-end NLG system evaluations as presented in conference and journal papers 1 over the last ten years order to better understand nature type that have been undertaken.We find researchers tend favour specific evaluation methods, their approaches are also correlated with publication venue.We further discuss what factors may influence types used for given system.

10.18653/v1/w15-4708 article EN cc-by 2015-01-01

Monitoring Users’ Behavior: Anti-Immigration Speech Detection on Twitter

OPENALEX - Publications

Nikolaos Pitropakis Kamil Kokot Dimitra Gkatzia Robert Ludwiniak Alexios Mylonas and 1 more

The proliferation of social media platforms changed the way people interact online. However, engagement with comes a price, users’ privacy. Breaches privacy, such as Cambridge Analytica scandal, can reveal how data be weaponized in political campaigns, which many times trigger hate speech and anti-immigration views. Hate detection is challenging task due to different sources that have an impact on language used, well lack relevant annotated data. To tackle this, we collected manually...

10.3390/make2030011 article EN cc-by Machine Learning and Knowledge Extraction 2020-08-03

You are What You Write: Preserving Privacy in the Era of Pre-Trained Language Models

OPENALEX - Publications

Richard Plant Mario Valerio Giuffrida Dimitra Gkatzia

Large scale adoption of pre-trained language models has introduced a new era convenient knowledge transfer for slew natural processing tasks. However, these run the risk undermining user trust, since they may enable malicious users to expose personally identifying information about subjects in other datasets through re-identification attacks. We present an empirical investigation into extent personal that can be extracted from representations produced by popular models, and we show positive...

10.2139/ssrn.4417900 preprint EN 2023-01-01

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

OPENALEX - Publications

Sebastian Gehrmann Abhik Bhattacharjee Abinaya Mahendiran Alex Wang Alexandros Papangelis and 72 more

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...

10.18653/v1/2022.emnlp-demos.27 article EN cc-by 2022-01-01

Comparing Multi-label Classification with Reinforcement Learning for Summarisation of Time-series Data

OPENALEX - Publications

Dimitra Gkatzia Helen Hastie Oliver Lemon

We present a novel approach for automatic report generation from time-series data, in the context of student feedback generation. Our proposed methodology treats content selection as multi-label (ML) classification problem, which takes input data and outputs set templates, while capturing dependencies between selected templates. show that this method generates output closer to lecturers actually generated, achieving 3.5% higher accuracy 15% F-score than multiple simple classifiers keep...

10.3115/v1/p14-1116 article EN 2014-01-01

Underreporting of errors in NLG output, and what to do about it

OPENALEX - Publications

Emiel van Miltenburg Miruna Clinciu Ondřej Dušek Dimitra Gkatzia Stephanie Inglis and 6 more

Emiel van Miltenburg, Miruna Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Schoch, Craig Thomson, Luou Wen. Proceedings of the 14th International Conference on Natural Language Generation. 2021.

10.18653/v1/2021.inlg-1.14 preprint EN cc-by 2021-01-01

Data-to-Text Generation Improves Decision-Making Under Uncertainty

OPENALEX - Publications

Dimitra Gkatzia Oliver Lemon Verena Rieser

Decision-making is often dependent on uncertain data, e.g. data associated with confidence scores or probabilities. This article presents a comparison of different information presentations for and, the first time, measures their effects human decision-making, in domain weather forecast generation. We use game-based setup to evaluate systems. show that Natural Language Generation (NLG) enhances decision-making under uncertainty, compared state-of-the-art graphical-based representation...

10.1109/mci.2017.2708998 article EN IEEE Computational Intelligence Magazine 2017-07-18

From the Virtual to the RealWorld: Referring to Objects in Real-World Spatial Scenes

OPENALEX - Publications

Dimitra Gkatzia Verena Rieser Phil Bartie William Mackaness

Predicting the success of referring expressions (RE) is vital for real-world applications such as navigation systems.Traditionally, research has focused on studying Referring Expression Generation (REG) in virtual, controlled environments.In this paper, we describe a novel study spatial references from real scenes rather than virtual.First, investigate how humans objects open, uncontrolled scenarios and compare our findings to those reported virtual environments.We show that REs differ...

10.18653/v1/d15-1224 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Generating unambiguous and diverse referring expressions

OPENALEX - Publications

Nikolaos Panagiaris Emma Hart Dimitra Gkatzia

10.1016/j.csl.2020.101184 article EN Computer Speech & Language 2020-12-31

Barriers and enabling factors for error analysis in NLG research

OPENALEX - Publications

Emiel van Miltenburg Miruna Clinciu Ondřej Dušek Dimitra Gkatzia Stephanie Inglis and 5 more

Earlier research has shown that few studies in Natural Language Generation (NLG) evaluate their system outputs using an error analysis, despite known limitations of automatic evaluation metrics and human ratings. This position paper takes the stance analyses should be encouraged, discusses several ways to do so. is not just based on our shared experience as authors, but we also distributed a survey means public consultation. We provide overview existing barriers carry out analyses, proposes...

10.3384/nejlt.2000-1533.2023.4529 article EN Northern European Journal of Language Technology 2023-02-21

Generating and Evaluating Landmark-Based Navigation Instructions in Virtual Environments

OPENALEX - Publications

Amanda Cercas Curry Dimitra Gkatzia Verena Rieser

Referring to landmarks has been identified lead improved navigation instructions.However, a previous corpus study suggests that human "wizards" also choose refer street names and generate user-centric instructions.In this paper, we conduct task-based evaluation of two systems reflecting the wizards' behaviours compare them against an version landmark-based systems, which resorts descriptions if landmark is estimated be invisible.We use GRUVE virtual interactive environment for evaluation.We...

10.18653/v1/w15-4715 article EN cc-by 2015-01-01

Content Selection in Data-to-Text Systems: A Survey

OPENALEX - Publications

Dimitra Gkatzia

Data-to-text systems are powerful in generating reports from data automatically and thus they simplify the presentation of complex data. Rather than presenting using visualisation techniques, data-to-text use natural (human) language, which is most common way for human-human communication. In addition, can adapt their output content to users' preferences, background or interests therefore be pleasant users interact with. Content selection an important part every system, because it module...

10.48550/arxiv.1610.08375 preprint EN other-oa arXiv (Cornell University) 2016-01-01

CAPE: Context-Aware Private Embeddings for Private Language Learning

OPENALEX - Publications

Richard Plant Dimitra Gkatzia Mario Valerio Giuffrida

Neural language models have contributed to state-of-the-art results in a number of downstream applications including sentiment analysis, intent classification and others. However, obtaining text representations or embeddings using these risks encoding personally identifiable information learned from context cues that may lead privacy leaks. To ameliorate this issue, we propose Context-Aware Private Embeddings (CAPE), novel approach which combines differential adversarial learning preserve...

10.18653/v1/2021.emnlp-main.628 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Opportunities and risks in the use of AI in career development practice

OPENALEX - Publications

Marianne Wilson Peter E. Robertson Peter Cruickshank Dimitra Gkatzia

The Covid-19 pandemic required many aspects of life to move online. This accelerated a broader trend for increasing use ICT and AI, with implications both the world work career development. article explores potential benefits challenges including AI in practice. It provides an overview technology, current uses, illustrate ways which it could enhance existing services, attendant practical ethical posed. Finally, recommendations are provided policy practice that will support development...

10.20856/jnicec.4807 article EN cc-by-nc-nd Journal of the National Institute for Career Education and Counselling 2022-04-30

Finding middle ground? Multi-objective Natural Language Generation from time-series data

OPENALEX - Publications

Dimitra Gkatzia Helen Hastie Oliver Lemon

A Natural Language Generation (NLG) system is able to generate text from nonlinguistic data, ideally personalising the content a user’s specific needs. In some cases, however, there are multiple stakeholders with their own individual goals, needs and preferences. this paper, we explore feasibility of combining preferences two different user groups, lecturers students, when generating summaries in context student feedback generation. The each group modelled as multivariate optimisation...

10.3115/v1/e14-4041 article EN 2014-01-01

You Are What You Write: Preserving Privacy in the Era of Large Language Models

OPENALEX - Publications

Richard E. Plant Mario Valerio Giuffrida Dimitra Gkatzia

Large scale adoption of large language models has introduced a new era convenient knowledge transfer for slew natural processing tasks. However, these also run the risk undermining user trust by exposing unwanted information about data subjects, which may be extracted malicious party, e.g. through adversarial attacks. We present an empirical investigation into extent personal encoded pre-trained representations range popular models, and we show positive correlation between complexity model,...

10.48550/arxiv.2204.09391 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Task2Dial: A Novel Task and Dataset for Commonsense-enhanced Task-based Dialogue Grounded in Documents

OPENALEX - Publications

Carl Strathearn Dimitra Gkatzia

This paper proposes a novel task on commonsense-enhanced task-based dialogue grounded in documents and describes the Task2Dial dataset, dataset of document-grounded dialogues, where an Information Giver (IG) provides instructions (by consulting document) to Follower (IF), so that latter can successfully complete task. In this unique setting, IF ask clarification questions which may not be underlying document require commonsense knowledge answered. The poses new challenges: (1) its human...

10.18653/v1/2022.dialdoc-1.21 article EN cc-by 2022-01-01

Multi-adaptive Natural Language Generation using Principal Component Regression

OPENALEX - Publications

Dimitra Gkatzia Helen Hastie Oliver Lemon

We present FeedbackGen, a system that uses multi-adaptive approach to Natural Language Generation.With the term 'multi-adaptive', we refer is able adapt its content different user groups simultaneously, in our case adapting both lecturers and students.We novel student feedback generation, which simultaneously takes into account preferences of students when determining be conveyed summary.In this framework, utilise knowledge derived from ratings on summaries by extracting most relevant...

10.3115/v1/w14-4422 article EN cc-by 2014-01-01

Coming Soon ...