NFDI4DS | UHH-SEMS - Publication Details

Marine Carpuat

ORCID: 0000-0003-1693-0782

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5078390032

Research Areas

Natural Language Processing Techniques
Topic Modeling
Text Readability and Simplification
Multimodal Machine Learning Applications
Semantic Web and Ontologies
Text and Document Classification Technologies
Speech and dialogue systems
Software Engineering Research
Authorship Attribution and Profiling
Biomedical Text Mining and Ontologies
Explainable Artificial Intelligence (XAI)
Speech Recognition and Synthesis
Adversarial Robustness in Machine Learning
Hate Speech and Cyberbullying Detection
Software Testing and Debugging Techniques
Advanced Text Analysis Techniques
Software Reliability and Analysis Research
Machine Learning and Algorithms
Logic, Reasoning, and Knowledge
Music and Audio Processing
Team Dynamics and Performance
Speech and Audio Processing
Translation Studies and Practices
Educational Systems and Policies
Bayesian Modeling and Causal Inference

University of Maryland, College Park
2016-2025

University of Maryland, Baltimore
2024

University of Baltimore
2024

Microsoft (United States)
2021

University of Southern California
2020

Carnegie Mellon University
2017

The University of Tokyo
2017

Karlsruhe Institute of Technology
2017

Laboratoire d'Informatique de Paris-Nord
2017

Johns Hopkins University
2017

FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN

OPENALEX - Publications

Milind Agarwal Sweta Agrawal Antonios Anastasopoulos Luisa Bentivogli Ondřej Bojar and 57 more

Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny...

10.18653/v1/2023.iwslt-1.1 article EN cc-by 2023-01-01

Curriculum Learning for Domain Adaptation in Neural Machine Translation

OPENALEX - Publications

Xuan Zhang Pamela Shapiro Gaurav Kumar Paul McNamee Marine Carpuat and 1 more

Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat, Kevin Duh. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1189 preprint EN 2019-01-01

An Empirical Exploration of Curriculum Learning for Neural Machine Translation

OPENALEX - Publications

Xuan Zhang Gaurav Kumar Huda Khayrallah Kenton Murray Jeremy Gwinnup and 4 more

Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims address this issue by choosing the order in which samples presented during training help train better models faster. We adopt a probabilistic view of curriculum learning, lets us flexibly evaluate impact curricula design, and perform an extensive exploration German-English task. Results show that it is possible improve convergence time at no loss quality. However, results highly...

10.48550/arxiv.1811.00739 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The Prompt Report: A Systematic Survey of Prompting Techniques

OPENALEX - Publications

Sander Schulhoff Michael Ilie Nishant Balepur Konstantine Kahadze Amanda Liu and 26 more

Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers end users interact with these through the use prompting or prompt engineering. While is a widespread highly researched concept, there exists conflicting terminology poor ontological understanding what constitutes due to area's nascency. This paper establishes structured prompts, by assembling taxonomy techniques analyzing their use. We present...

10.48550/arxiv.2406.06608 preprint EN arXiv (Cornell University) 2024-06-06

Word sense disambiguation vs. statistical machine translation

OPENALEX - Publications

Marine Carpuat Dekai Wu

We directly investigate a subject of much recent debate: do word sense disambiguation models help statistical machine translation quality? present empirical results casting doubt on this common, but unproved, assumption. Using state-of-the-art Chinese model to choose candidates for typical IBM MT system, we find that does not yield significantly better quality than the system alone. Error analysis suggests several key factors behind surprising finding, including inherent limitations current...

10.3115/1219840.1219888 article EN 2005-01-01

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

OPENALEX - Publications

Xing Niu Michael Denkowski Marine Carpuat

Despite impressive progress in high-resource settings, Neural Machine Translation (NMT) still struggles low-resource and out-of-domain scenarios, often failing to match the quality of phrase-based translation. We propose a novel technique that combines back-translation multilingual NMT improve performance these difficult cases. Our trains single model for both directions language pair, allowing us back-translate source or target monolingual data without requiring an auxiliary model. then...

10.18653/v1/w18-2710 article EN cc-by 2018-01-01

Measuring Machine Translation Errors in New Domains

OPENALEX - Publications

Ann Irvine John Morgan Marine Carpuat Hal Daumé Dragos Stefan Munteanu

We develop two techniques for analyzing the effect of porting a machine translation system to new domain. One is macro-level analysis that measures how domain shift affects corpus-level evaluation; second micro-level word-level errors. apply these methods understand what happens when Parliament-trained phrase-based applied in four very different domains: news, medical texts, scientific articles and movie subtitles. present quantitative qualitative experiments highlight opportunities future...

10.1162/tacl_a_00239 article EN cc-by Transactions of the Association for Computational Linguistics 2013-12-01

Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation

OPENALEX - Publications

Marine Carpuat Yogarshi Vyas Xing Niu

Parallel corpora are often not as parallel one might assume: non-literal translations and noisy abound, even in curated routinely used for training evaluation. We use a cross-lingual textual entailment system to distinguish sentence pairs that meaning from those not, show filtering out divergent examples improves translation quality.

10.18653/v1/w17-3209 article EN cc-by 2017-01-01

A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output

OPENALEX - Publications

Xing Niu Marianna J. Martindale Marine Carpuat

Stylistic variations of language, such as formality, carry speakers’ intention beyond literal meaning and should be conveyed adequately in translation. We propose to use lexical formality models control the level machine translation output. demonstrate effectiveness our approach empirical evaluations, measured by automatic metrics human assessments.

10.18653/v1/d17-1299 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2017-01-01

Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

OPENALEX - Publications

Weijia Xu Sweta Agrawal Eleftheria Briakou Marianna J. Martindale Marine Carpuat

Abstract Neural sequence generation models are known to “hallucinate”, by producing outputs that unrelated the source text. These hallucinations potentially harmful, yet it remains unclear in what conditions they arise and how mitigate their impact. In this work, we first identify internal model symptoms of analyzing relative token contributions contrastive hallucinated vs. non-hallucinated generated via perturbations. We then show these reliable indicators natural hallucinations, using them...

10.1162/tacl_a_00563 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

Findings of the IWSLT 2024 Evaluation Campaign

OPENALEX - Publications

Ibrahim Said Ahmad Antonios Anastasopoulos Ondřej Bojar Claudia Borg Marine Carpuat and 40 more

This paper reports on the shared tasks organized by 21st IWSLT Conference. The address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling dubbing, speech-to-speech dialect low-resource speech Indic languages. attracted 18 teams whose submissions are documented 26 system papers. growing interest towards translation is also witnessed constantly increasing number of task organizers contributors to overview paper, almost evenly...

10.48550/arxiv.2411.05088 preprint EN arXiv (Cornell University) 2024-11-07

The NRC System for Discriminating Similar Languages

OPENALEX - Publications

Cyril Goutte Serge Léger Marine Carpuat

We describe the system built by National Research Council Canada for "Discriminating between similar languages" (DSL) shared task.Our uses various statistical classifiers and makes predictions based on a two-stage process: we first predict language group, then discriminate languages or variants within group.Language groups are predicted using generative classifier with 99.99% accuracy five target groups.Within each group (except English), use voting combination of discriminative trained...

10.3115/v1/w14-5316 article EN cc-by 2014-01-01

SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)

OPENALEX - Publications

Nathan Schneider Dirk Hovy Anders Johannsen Marine Carpuat

This task combines the labeling of multiword expressions and supersenses (coarse-grained classes) in an explicit, yet broad-coverage paradigm for lexical semantics.Nine systems participated; best scored 57.7% F 1 a multi-domain evaluation setting, indicating that remains largely unresolved.An error analysis reveals large number instances data set are either hard cases, which no get right, or easy all correctly solve.

10.18653/v1/s16-1084 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2016-01-01

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

OPENALEX - Publications

Xing Niu Sudha Rao Marine Carpuat

Generating natural language requires conveying content in an appropriate style. We explore two related tasks on generating text of varying formality: monolingual formality transfer and formality-sensitive machine translation. propose to solve these jointly using multi-task learning, show that our models achieve state-of-the-art performance for are able perform translation without being explicitly trained style-annotated examples.

10.48550/arxiv.1806.04357 preprint EN cc-by arXiv (Cornell University) 2018-01-01

Controlling Text Complexity in Neural Machine Translation

OPENALEX - Publications

Sweta Agrawal Marine Carpuat

Sweta Agrawal, Marine Carpuat. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1166 article EN cc-by 2019-01-01

EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints

OPENALEX - Publications

Weijia Xu Marine Carpuat

Abstract We introduce an Edit-Based TransfOrmer with Repositioning (EDITOR), which makes sequence generation flexible by seamlessly allowing users to specify preferences in output lexical choice. Building on recent models for non-autoregressive (Gu et al., 2019), EDITOR generates new sequences iteratively editing hypotheses. It relies a novel reposition operation designed disentangle choice from word positioning decisions, while enabling efficient oracles imitation learning and parallel...

10.1162/tacl_a_00368 article EN cc-by Transactions of the Association for Computational Linguistics 2021-01-01

Words as Bridges: Exploring Computational Support for Cross-Disciplinary Translation Work

OPENALEX - Publications

Calvin Bao Yow-Ting Shiue Marine Carpuat Joel Chan

10.1145/3708359.3712110 article EN 2025-03-19

One translation per discourse

OPENALEX - Publications

Marine Carpuat

We revisit the one sense per discourse hypothesis of Gale et al. in context machine translation. Since a given can be lexicalized differently translation, do we observe translation discourse? Analysis manual translations reveals that still holds when using parallel text as annotation, thus confirming translational differences represent useful distinctions. Statistical Machine Translation (SMT) output showed despite ignoring document structure, is strongly supported part because low...

10.3115/1621969.1621974 article EN 2009-01-01

Identifying Semantic Divergences in Parallel Text without Annotations

OPENALEX - Publications

Yogarshi Vyas Xing Niu Marine Carpuat

Yogarshi Vyas, Xing Niu, Marine Carpuat. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1136 article EN cc-by 2018-01-01

Coming Soon ...