NFDI4DS | UHH-SEMS - Publication Details

Maxime Peyrard

ORCID: 0000-0003-4782-6603

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5078829915

Research Areas

Topic Modeling
Natural Language Processing Techniques
Advanced Text Analysis Techniques
Multimodal Machine Learning Applications
Adversarial Robustness in Machine Learning
Explainable Artificial Intelligence (XAI)
Wikis in Education and Collaboration
Speech Recognition and Synthesis
Text Readability and Simplification
Machine Learning and Data Classification
Video Analysis and Summarization
Humor Studies and Applications
Hate Speech and Cyberbullying Detection
Discourse Analysis in Language Studies
Digital Communication and Language
Bayesian Modeling and Causal Inference
Web Data Mining and Analysis
Public Relations and Crisis Communication
Software Engineering Research
Misinformation and Its Impacts
Communication and COVID-19 Impact
Data Quality and Management
Multi-Agent Systems and Negotiation
Computational and Text Analysis Methods
Text and Document Classification Technologies

Laboratoire d'Informatique de Grenoble
2023-2024

Grenoble Images Parole Signal Automatique
2023-2024

École Polytechnique Fédérale de Lausanne
2016-2024

Microsoft (United States)
2023

Microsoft (Finland)
2022

Laboratoire d'Informatique Fondamentale de Lille
2021-2022

Technical University of Darmstadt
2016-2021

Laboratoire d'Informatique de Paris-Nord
2016-2018

University of Southern California
2017

Drexel University
2017

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

OPENALEX - Publications

Wei Zhao Maxime Peyrard Fei Liu Yang Gao Christian M. Meyer and 1 more

Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1053 article EN cc-by 2019-01-01

A Simple Theoretical Model of Importance for Summarization

OPENALEX - Publications

Maxime Peyrard

Research on summarization has mainly been driven by empirical approaches, crafting systems to perform well standard datasets with the notion of information Importance remaining latent. We argue that establishing theoretical models will advance our understanding task and help further improve systems. To this end, we propose simple but rigorous definitions several concepts were previously used only intuitively in summarization: Redundancy, Relevance, Informativeness. arises as a single...

10.18653/v1/p19-1101 article EN cc-by 2019-01-01

REFINER: Reasoning Feedback on Intermediate Representations

OPENALEX - Publications

Debjit Paul Mete Ismayilzada Maxime Peyrard Beatriz Borges Antoine Bosselut and 2 more

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs generate while interacting with critic model that provides automated feedback reasoning. Specifically, structured LM uses iteratively improve...

10.48550/arxiv.2304.01904 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

OPENALEX - Publications

Martin Josifoski Marija Šakota Maxime Peyrard Robert West

Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...

10.18653/v1/2023.emnlp-main.96 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Learning to Score System Summaries for Better Content Selection Evaluation.

OPENALEX - Publications

Maxime Peyrard Teresa Botschen Iryna Gurevych

The evaluation of summaries is a challenging but crucial task the summarization field. In this work, we propose to learn an automatic scoring metric based on human judgements available as part classical datasets like TAC-2008 and TAC-2009. Any existing metrics can be included features, model learns combination exhibiting best correlation with judgments. reliability new tested in further manual where ask humans evaluate covering whole spectrum metric. We release trained open-source tool.

10.18653/v1/w17-4510 article EN cc-by 2017-01-01

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

OPENALEX - Publications

Wei Zhao Goran Glavaš Maxime Peyrard Yang Gao Robert West and 1 more

Evaluation of cross-lingual encoders is usually performed either via zero-shot transfer in supervised downstream tasks or unsupervised textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Reference-free holds the promise web-scale comparison MT systems. We systematically...

10.18653/v1/2020.acl-main.151 article EN cc-by 2020-01-01

GenIE: Generative Information Extraction

OPENALEX - Publications

Martin Josifoski Nicola De Cao Maxime Peyrard Fabio Petroni Robert West

Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.342 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling

OPENALEX - Publications

Marija akota Maxime Peyrard Robert West

Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural prompts for an LM, from whose output the solution then extracted. LM performance has consistently been increasing with model size - but so monetary cost querying ever larger models. Importantly, however, not all are equally hard: some require LMs obtaining satisfactory solution, whereas others smaller suffice. Based on this fact, we design framework...

10.1145/3616855.3635825 preprint EN 2024-03-04

Studying Summarization Evaluation Metrics in the Appropriate Scoring Range

OPENALEX - Publications

Maxime Peyrard

In summarization, automatic evaluation metrics are usually compared based on their ability to correlate with human judgments. Unfortunately, the few existing judgment datasets have been created as by-products of manual evaluations performed during DUC/TAC shared tasks. However, modern systems typically better than best submitted at time these We show that, surprisingly, which behave similarly (average-scoring range) strongly disagree in higher-scoring range current now operate. It is...

10.18653/v1/p19-1502 article EN cc-by 2019-01-01

Meta-Statistical Learning: Supervised Learning of Statistical Inference

OPENALEX - Publications

Maxime Peyrard Kyunghyun Cho

This work demonstrates that the tools and principles driving success of large language models (LLMs) can be repurposed to tackle distribution-level tasks, where goal is predict properties data-generating distribution rather than labels for individual datapoints. These tasks encompass statistical inference problems such as parameter estimation, hypothesis testing, or mutual information estimation. Framing these within traditional machine learning pipelines challenging, supervision typically...

10.48550/arxiv.2502.12088 preprint EN arXiv (Cornell University) 2025-02-17

Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations

OPENALEX - Publications

Andreas Rücklé Steffen Eger Maxime Peyrard Iryna Gurevych

Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. However, they typically fall short of the performances complex models such as InferSent. Here, we generalize concept average to power mean embeddings. We show that concatenation different types considerably closes gap state-of-the-art methods monolingually and substantially outperforms these techniques cross-lingually. In addition, our proposed method recently baselines SIF Sent2Vec by solid...

10.48550/arxiv.1803.01400 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The Glass Ceiling of Automatic Evaluation in Natural Language Generation

OPENALEX - Publications

Pierre Colombo Maxime Peyrard Nathan Noiry Robert West Pablo Piantanida

Automatic evaluation metrics capable of replacing human judgments are critical to allowing fast development new methods.Thus, numerous research efforts have focused on crafting such metrics.In this work, we take a step back and analyze recent progress by comparing the body existing automatic altogether.As used based how they rank systems, compare in space system rankings.Our extensive statistical analysis reveals surprising findings: metricsold -are much more similar each other than...

10.18653/v1/2023.findings-ijcnlp.16 article EN cc-by 2023-01-01

Distribution Inference Risks: Identifying and Mitigating Sources of Leakage

OPENALEX - Publications

Valentin N. Hartmann Léo Meynent Maxime Peyrard Dimitrios Dimitriadis Shruti Tople and 1 more

A large body of work shows that machine learning (ML) models can leak sensitive or confidential information about their training data. Recently, leakage due to distribution inference (or property inference) attacks is gaining attention. In this attack, the goal an adversary infer distributional So far, research on has focused demonstrating successful attacks, with little attention given identifying potential causes and proposing mitigations. To bridge gap, as our main contribution, we...

10.1109/satml54575.2023.00018 article EN 2023-02-01

Objective Function Learning to Match Human Judgements for Optimization-Based Summarization

OPENALEX - Publications

Maxime Peyrard Iryna Gurevych

Maxime Peyrard, Iryna Gurevych. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.

10.18653/v1/n18-2103 article EN cc-by 2018-01-01

Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization

OPENALEX - Publications

Maxime Peyrard Judith Eckle‐Kohler

We present a new supervised framework that learns to estimate automatic Pyramid scores and uses them for optimization-based extractive multi-document summarization. For learning scores, we developed method training data generation which is based on genetic algorithm using as the fitness function. Our experimental evaluation shows our significantly outperforms strong baselines regarding Pyramid, there much room improvement in comparison with upper-bound Pyramid.

10.18653/v1/p17-1100 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

OPENALEX - Publications

Wei Zhao Maxime Peyrard Fei Liu Yang Gao Christian M. Meyer and 1 more

A robust evaluation metric has a profound impact on the development of text generation systems. desirable compares system output against references based their semantics rather than surface forms. In this paper we investigate strategies to encode and reference texts devise that shows high correlation with human judgment quality. We validate our new metric, namely MoverScore, number tasks including summarization, machine translation, image captioning, data-to-text generation, where outputs...

10.48550/arxiv.1909.02622 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Optimizing an Approximation of ROUGE - a Problem-Reduction Approach to Extractive Multi-Document Summarization

OPENALEX - Publications

Maxime Peyrard Judith Eckle‐Kohler

This paper presents a problem-reduction approach to extractive multi-document summarization: we propose reduction the problem of scoring individual sentences with their ROUGE scores based on supervised learning.For summarization, solve an optimization where score selected summary is maximized.To this end, derive approximation ROUGE-N set sentences, and define principled discrete for sentence selection.Mathematical empirical evidence suggests that selection step solved almost exactly, thus...

10.18653/v1/p16-1172 article EN Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016-01-01

Better than Average: Paired Evaluation of NLP systems

OPENALEX - Publications

Maxime Peyrard Wei Zhao Steffen Eger Robert West

Maxime Peyrard, Wei Zhao, Steffen Eger, Robert West. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.179 preprint EN cc-by 2021-01-01

On the Context-Free Ambiguity of Emoji

OPENALEX - Publications

Justyna Częstochowska Kristina Gligorić Maxime Peyrard Yann Mentha Michał Bień and 4 more

Due to their pictographic nature, emojis come with baked-in, grounded semantics. Although this makes promising candidates for new forms of more accessible communication, it is still unknown what degree humans agree on the inherent meaning when encountering them outside concrete textual contexts. To bridge gap, we collected a crowdsourced dataset (made publicly available) one-word descriptions 1,289 presented participants no surrounding text. The and interpretations were then examined...

10.1609/icwsm.v16i1.19393 article EN Proceedings of the International AAAI Conference on Web and Social Media 2022-05-31

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

OPENALEX - Publications

Saibo Geng Martin Josifoski Maxime Peyrard Robert West

Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required format exactly. To address this issue, grammar-constrained decoding (GCD) can be used control generation of LMs, guaranteeing that follows a given structure. Most existing GCD methods are, however, limited specific tasks, such as parsing or code generation. In work, we demonstrate formal grammars describe space for much...

10.18653/v1/2023.emnlp-main.674 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Coming Soon ...