Yvette Graham

ORCID: 0000-0001-6741-4855
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Advanced Image and Video Retrieval Techniques
  • Text Readability and Simplification
  • Speech and dialogue systems
  • Human Pose and Action Recognition
  • Semantic Web and Ontologies
  • Software Engineering Research
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • Domain Adaptation and Few-Shot Learning
  • Biomedical Text Mining and Ontologies
  • Computational and Text Analysis Methods
  • Translation Studies and Practices
  • Mobile Crowdsensing and Crowdsourcing
  • Lexicography and Language Studies
  • Explainable Artificial Intelligence (XAI)
  • AI in Service Interactions
  • Algorithms and Data Compression
  • Cognitive Science and Mapping
  • Authorship Attribution and Profiling
  • Mathematics, Computing, and Information Processing
  • Logic, programming, and type systems

Trinity College Dublin
2015-2024

Dublin City University
2007-2021

University of Sheffield
2017-2021

University of Amsterdam
2016-2021

Bar-Ilan University
2021

University of Helsinki
2021

Tel Aviv University
2021

Technical University of Darmstadt
2021

University of Copenhagen
2021

Edinburgh Napier University
2021

Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Christof Monz. Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 2018.

10.18653/v1/w18-6401 preprint EN cc-by 2018-01-01

Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 2016.

10.18653/v1/w16-2301 article EN 2016-01-01

Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, Marcos Zampieri. Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). 2019.

10.18653/v1/w19-5301 article EN cc-by 2019-01-01

Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi. Proceedings of the Second Conference on Machine Translation. 2017.

10.18653/v1/w17-4717 article EN cc-by 2017-01-01

This paper presents the results of WMT19 Metrics Shared Task. Participants were asked to score outputs translations systems competing in News Translation Task with automatic metrics. 13 research groups submitted 24 metrics, 10 which are reference-less "metrics" and constitute submissions joint task Quality Estimation Task, "QE as a Metric". In addition, we computed 11 baseline 8 commonly applied baselines (BLEU, SentBLEU, NIST, WER, PER, TER, CDER, chrF) 3 reimplementations (chrF+,...

10.18653/v1/w19-5302 article EN cc-by 2019-01-01

Abstract Crowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work filtered avoid contamination results through inclusion false assessments. One method filter via agreement with experts, but even amongst experts levels may not high. In this paper, we present new methodology for crowd-sourcing human quality, which allows individual workers develop their own assessment strategy....

10.1017/s1351324915000339 article EN Natural Language Engineering 2015-09-15

We provide an analysis of current evaluation methodologies applied to summarization metrics and identify the following areas concern: (1) movement away from by correlation with human assessment; (2) omission important components assessment evaluations, in addition large numbers metric variants; (3) absence methods significance testing improvements over a baseline.We outline methodology that overcomes all such challenges, providing first method suitable for metrics.Our reveals time which...

10.18653/v1/d15-1013 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

This paper presents the results of WMT17 Metrics Shared Task.We asked participants this task to score outputs MT systems involved in news translation and Neural training task.We collected scores 14 metrics from 8 research groups.In addition that, we computed 7 standard (BLEU, SentBLEU, NIST, WER, PER, TER CDER) as baselines.The were evaluated terms system-level correlation (how well each metric's correlate with official manual ranking systems) segment level often a metric agrees humans...

10.18653/v1/w17-4755 article EN cc-by 2017-01-01

This paper presents the results of WMT16 Metrics Shared Task.We asked participants this task to score outputs MT systems involved in Translation collected scores 16 metrics from 9 research groups.In addition that, we computed standard (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines.The were evaluated terms system-level correlation (how well each metric's correlate with official manual ranking systems) segment level often a metric agrees humans comparing two translations...

10.18653/v1/w16-2302 article EN cc-by 2016-01-01

This paper presents the results of WMT18 Metrics Shared Task. We asked participants this task to score outputs MT systems involved in News Translation Task with automatic metrics. collected scores 10 metrics and 8 research groups. In addition that, we computed standard (BLEU, SentBLEU, chrF, NIST, WER, PER, TER CDER) as baselines. The were evaluated terms system-level correlation (how well each metric's correlate official manual ranking systems) segment-level often a metric agrees humans...

10.18653/v1/w18-6450 article EN cc-by 2018-01-01

Evaluation of segment-level machine translation metrics is currently hampered by: (1) low inter-annotator agreement levels in human assessments; (2) lack an effective mechanism for evaluation translations equal quality; and (3) methods significance testing improvements over a baseline.In this paper, we provide solutions to each these challenges outline new methodology aimed specifically at assessment metrics.We replicate the component WMT-13 reveal that current state-of-the-art performance...

10.3115/v1/n15-1124 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

We report results from the SR'19 Shared Task, second edition of a multilingual surface realisation task organised as part EMNLP'19 Workshop on Multilingual Surface Realisation. As in SR'18, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered eleven, three languages. Systems evaluated...

10.18653/v1/d19-6301 article EN cc-by 2019-01-01

The term translationese has been used to describe features of translated text, and in this paper, we provide detailed analysis potential adverse effects on machine translation evaluation. Our shows differences conclusions drawn from evaluations that include test data compared experiments tested only with text originally composed language. For reason recommend reverse-created be omitted future sets. In addition, a re-evaluation past evaluation claiming human-parity MT. One important issue not...

10.18653/v1/2020.emnlp-main.6 article EN cc-by 2020-01-01

Evaluation of open-domain dialogue systems is highly challenging and development better techniques highlighted time again as desperately needed. Despite substantial efforts to carry out reliable live evaluation in recent competitions, annotations have been abandoned reported too unreliable yield sensible results. This a serious problem since automatic metrics are not known provide good indication what may or be high-quality conversation. Answering the distress call competitions that...

10.18653/v1/2022.acl-long.445 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Automatic metrics are widely used in machine translation as a substitute for human assessment. With the introduction of any new metric comes question just how well that mimics assessment quality. This is often measured by correlation with judgment. Significance tests generally not to establish whether improvements over existing methods such BLEU statistically significant or have occurred simply chance, however. In this paper, we introduce significance test comparing correlations two metrics,...

10.3115/v1/d14-1020 article EN 2014-01-01

The term translationese has been used to describe the presence of unusual features translated text. In this paper, we provide a detailed analysis adverse effects on machine translation evaluation results. Our shows evidence support differences in text originally written given language relative and can potentially negatively impact accuracy evaluations. For reason recommend that reverse-created test data be omitted from future sets. addition, re-evaluation past high-profile claiming...

10.48550/arxiv.1906.09833 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Recent human evaluation of machine translation has focused on relative preference judgments quality, making it difficult to track longitudinal improvements over time. We carry out a large-scale crowd-sourcing experiment estimate the degree which state-of-theart performance in increased past five years. To facilitate evaluation, we move away from and instead ask judges provide direct estimates quality individual translations isolation alternate outputs. For seven European language pairs, our...

10.3115/v1/e14-1047 article EN cc-by 2014-01-01

Yvette Graham. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.

10.3115/v1/p15-1174 article EN cc-by 2015-01-01

Existing metrics to evaluate the quality of Machine Translation hypotheses take different perspectives into account.DPM-Fcomb, a metric combining merits range metrics, achieved best performance for evaluation to-English language pairs in previous two years WMT Metrics Shared Tasks.This year, we submit novel combined metric, Blend, WMT17 task.Compared DPMFcomb, Blend includes following adaptations: i) We use DA human guide training process with vast reduction required data, while still...

10.18653/v1/w17-4768 article EN cc-by 2017-01-01

We report results from the SR'18 Shared Task, a new multilingual surface realisation task organised as part of ACL'18 Workshop on Multilingual Surface Realisation. As in its English-only predecessor SR'11, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered ten, three languages. Systems evaluated...

10.18653/v1/w18-3601 article EN cc-by 2018-01-01

Randomized methods of significance testing enable estimation the probability that an increase in score has occurred simply by chance. In this paper, we examine accuracy three randomized context machine translation: paired bootstrap resampling, resampling and approximate randomization. We carry out a large-scale human evaluation shared task systems for two language pairs to provide gold standard tests. Results show very little difference across testing. Notably, all test/metric combinations...

10.3115/v1/w14-3333 article EN 2014-01-01
Coming Soon ...