- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Multimodal Machine Learning Applications
- Adversarial Robustness in Machine Learning
- Explainable Artificial Intelligence (XAI)
- Wikis in Education and Collaboration
- Speech Recognition and Synthesis
- Text Readability and Simplification
- Machine Learning and Data Classification
- Video Analysis and Summarization
- Humor Studies and Applications
- Hate Speech and Cyberbullying Detection
- Discourse Analysis in Language Studies
- Digital Communication and Language
- Bayesian Modeling and Causal Inference
- Web Data Mining and Analysis
- Public Relations and Crisis Communication
- Software Engineering Research
- Misinformation and Its Impacts
- Communication and COVID-19 Impact
- Data Quality and Management
- Multi-Agent Systems and Negotiation
- Computational and Text Analysis Methods
- Text and Document Classification Technologies
Laboratoire d'Informatique de Grenoble
2023-2024
Grenoble Images Parole Signal Automatique
2023-2024
École Polytechnique Fédérale de Lausanne
2016-2024
Microsoft (United States)
2023
Microsoft (Finland)
2022
Laboratoire d'Informatique Fondamentale de Lille
2021-2022
Technical University of Darmstadt
2016-2021
Laboratoire d'Informatique de Paris-Nord
2016-2018
University of Southern California
2017
Drexel University
2017
Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Research on summarization has mainly been driven by empirical approaches, crafting systems to perform well standard datasets with the notion of information Importance remaining latent. We argue that establishing theoretical models will advance our understanding task and help further improve systems. To this end, we propose simple but rigorous definitions several concepts were previously used only intuitively in summarization: Redundancy, Relevance, Informativeness. arises as a single...
Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs generate while interacting with critic model that provides automated feedback reasoning. Specifically, structured LM uses iteratively improve...
Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...
The evaluation of summaries is a challenging but crucial task the summarization field. In this work, we propose to learn an automatic scoring metric based on human judgements available as part classical datasets like TAC-2008 and TAC-2009. Any existing metrics can be included features, model learns combination exhibiting best correlation with judgments. reliability new tested in further manual where ask humans evaluate covering whole spectrum metric. We release trained open-source tool.
Evaluation of cross-lingual encoders is usually performed either via zero-shot transfer in supervised downstream tasks or unsupervised textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Reference-free holds the promise web-scale comparison MT systems. We systematically...
Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.
Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural prompts for an LM, from whose output the solution then extracted. LM performance has consistently been increasing with model size - but so monetary cost querying ever larger models. Importantly, however, not all are equally hard: some require LMs obtaining satisfactory solution, whereas others smaller suffice. Based on this fact, we design framework...
In summarization, automatic evaluation metrics are usually compared based on their ability to correlate with human judgments. Unfortunately, the few existing judgment datasets have been created as by-products of manual evaluations performed during DUC/TAC shared tasks. However, modern systems typically better than best submitted at time these We show that, surprisingly, which behave similarly (average-scoring range) strongly disagree in higher-scoring range current now operate. It is...
This work demonstrates that the tools and principles driving success of large language models (LLMs) can be repurposed to tackle distribution-level tasks, where goal is predict properties data-generating distribution rather than labels for individual datapoints. These tasks encompass statistical inference problems such as parameter estimation, hypothesis testing, or mutual information estimation. Framing these within traditional machine learning pipelines challenging, supervision typically...
Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. However, they typically fall short of the performances complex models such as InferSent. Here, we generalize concept average to power mean embeddings. We show that concatenation different types considerably closes gap state-of-the-art methods monolingually and substantially outperforms these techniques cross-lingually. In addition, our proposed method recently baselines SIF Sent2Vec by solid...
Automatic evaluation metrics capable of replacing human judgments are critical to allowing fast development new methods.Thus, numerous research efforts have focused on crafting such metrics.In this work, we take a step back and analyze recent progress by comparing the body existing automatic altogether.As used based how they rank systems, compare in space system rankings.Our extensive statistical analysis reveals surprising findings: metricsold -are much more similar each other than...
A large body of work shows that machine learning (ML) models can leak sensitive or confidential information about their training data. Recently, leakage due to distribution inference (or property inference) attacks is gaining attention. In this attack, the goal an adversary infer distributional So far, research on has focused demonstrating successful attacks, with little attention given identifying potential causes and proposing mitigations. To bridge gap, as our main contribution, we...
Maxime Peyrard, Iryna Gurevych. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.
We present a new supervised framework that learns to estimate automatic Pyramid scores and uses them for optimization-based extractive multi-document summarization. For learning scores, we developed method training data generation which is based on genetic algorithm using as the fitness function. Our experimental evaluation shows our significantly outperforms strong baselines regarding Pyramid, there much room improvement in comparison with upper-bound Pyramid.
A robust evaluation metric has a profound impact on the development of text generation systems. desirable compares system output against references based their semantics rather than surface forms. In this paper we investigate strategies to encode and reference texts devise that shows high correlation with human judgment quality. We validate our new metric, namely MoverScore, number tasks including summarization, machine translation, image captioning, data-to-text generation, where outputs...
This paper presents a problem-reduction approach to extractive multi-document summarization: we propose reduction the problem of scoring individual sentences with their ROUGE scores based on supervised learning.For summarization, solve an optimization where score selected summary is maximized.To this end, derive approximation ROUGE-N set sentences, and define principled discrete for sentence selection.Mathematical empirical evidence suggests that selection step solved almost exactly, thus...
Maxime Peyrard, Wei Zhao, Steffen Eger, Robert West. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
Due to their pictographic nature, emojis come with baked-in, grounded semantics. Although this makes promising candidates for new forms of more accessible communication, it is still unknown what degree humans agree on the inherent meaning when encountering them outside concrete textual contexts. To bridge gap, we collected a crowdsourced dataset (made publicly available) one-word descriptions 1,289 presented participants no surrounding text. The and interpretations were then examined...
Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required format exactly. To address this issue, grammar-constrained decoding (GCD) can be used control generation of LMs, guaranteeing that follows a given structure. Most existing GCD methods are, however, limited specific tasks, such as parsing or code generation. In work, we demonstrate formal grammars describe space for much...