NFDI4DS | UHH-SEMS - Publication Details

Simon Mille

ORCID: 0000-0002-8852-2764

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5077016111

Research Areas

Natural Language Processing Techniques
Topic Modeling
Semantic Web and Ontologies
Speech and dialogue systems
Text Readability and Simplification
Software Engineering Research
Data Management and Algorithms
Advanced Text Analysis Techniques
linguistics and terminology studies
Multimodal Machine Learning Applications
Biomedical Text Mining and Ontologies
Data Quality and Management
Intellectual Property and Patents
Handwritten Text Recognition Techniques
3D Surveying and Cultural Heritage
Geographic Information Systems Studies
Data Visualization and Analytics
Advanced Database Systems and Queries
Delphi Technique in Research
Translation Studies and Practices
Artificial Intelligence in Law
Cybercrime and Law Enforcement Studies
Information and Cyber Security
Spam and Phishing Detection
Mathematics, Computing, and Information Processing

Dublin City University
2023

Universitat Pompeu Fabra
2013-2022

FC Barcelona
2017-2020

University of Brighton
2018-2019

Institució Catalana de Recerca i Estudis Avançats
2018-2019

University of Coimbra
2017

Thomson Reuters (United States)
2017

University Press of Florida
2017

Bridge University
2017

University of Cambridge
2017

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

OPENALEX - Publications

Sebastian Gehrmann Tosin Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Anuoluwapo Aremu and 51 more

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa...

10.18653/v1/2021.gem-1.10 preprint ID cc-by 2021-01-01

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

OPENALEX - Publications

David M. Howcroft Anja Belz Miruna Clinciu Dimitra Gkatzia Sadid A. Hasan and 5 more

David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser. Proceedings of the 13th International Conference on Natural Language Generation. 2020.

10.18653/v1/2020.inlg-1.23 article EN cc-by 2020-01-01

Making It Simplext

OPENALEX - Publications

Horacio Saggion Sanja Štajner Stefan Bott Simon Mille Luz Rello and 1 more

The way in which a text is written can be barrier for many people. Automatic simplification natural language processing technology that, when mature, could used to produce texts that are adapted the specific needs of particular users. Most research area automatic has dealt with English language. In this article, we present results from Simplext project, dedicated Spanish. We modular system procedures syntactic and lexical grounded on analysis corpus manually simplified people special needs....

10.1145/2738046 article EN ACM Transactions on Accessible Computing 2015-05-11

The Second Multilingual Surface Realisation Shared Task (SR’19): Overview and Evaluation Results

OPENALEX - Publications

Simon Mille Anja Belz Bernd Bohnet Yvette Graham Leo Wanner

We report results from the SR'19 Shared Task, second edition of a multilingual surface realisation task organised as part EMNLP'19 Workshop on Multilingual Surface Realisation. As in SR'18, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered eleven, three languages. Systems evaluated...

10.18653/v1/d19-6301 article EN cc-by 2019-01-01

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

OPENALEX - Publications

Sebastian Gehrmann Tosin Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Anuoluwapo Aremu and 51 more

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on constantly evolving ecosystem of automated metrics, datasets, human evaluation standards. Due to this moving target, new models often still evaluate divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging identify the limitations current opportunities progress. Addressing limitation, GEM provides...

10.48550/arxiv.2102.01672 preprint EN cc-by arXiv (Cornell University) 2021-01-01

The First Multilingual Surface Realisation Shared Task (SR’18): Overview and Evaluation Results

OPENALEX - Publications

Simon Mille Anja Belz Bernd Bohnet Yvette Graham Emily Pitler and 1 more

We report results from the SR'18 Shared Task, a new multilingual surface realisation task organised as part of ACL'18 Workshop on Multilingual Surface Realisation. As in its English-only predecessor SR'11, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered ten, three languages. Systems evaluated...

10.18653/v1/w18-3601 article EN cc-by 2018-01-01

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

OPENALEX - Publications

Anja Belz Simon Mille David M. Howcroft

Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which are comparable can be expected to yield similar results when applied the same system outputs. This has serious implications reproducibility testing meta-evaluation, particular given that evaluation considered gold standard against trustworthiness of automatic metrics gauged. %and merging others, as well deciding should able reproduce each other’s results. Using examples from NLG, we...

10.18653/v1/2020.inlg-1.24 article EN cc-by 2020-01-01

Towards content-oriented patent document processing: Intelligent patent analysis and summarization

OPENALEX - Publications

Sören Brügmann Nadjet Bouayad‐Agha Alicia Burga Serguei Carrascosa Alberto Ciaramella and 10 more

10.1016/j.wpi.2014.10.003 article EN World Patent Information 2014-12-15

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

OPENALEX - Publications

Kaustubh Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li and 95 more

Data augmentation is an important component in the robustness evaluation of models natural language processing (NLP) and enhancing diversity data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based framework which supports creation both transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 for variety tasks. demonstrate efficacy NL-Augmenter by using several its analyze popular...

10.48550/arxiv.2112.02721 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Quantified Reproducibility Assessment of NLP Results

OPENALEX - Publications

Anja Belz Maja Popović Simon Mille

This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts definitions from metrology. QRA produces single score estimating the degree of given system evaluation measure, basis scores from, differences between, different reproductions. We test 18 measure combinations (involving diverse NLP tasks types evaluation), each which we have original results one to seven reproduction results. The proposed degree-of-reproducibility...

10.18653/v1/2022.acl-long.2 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

NL-Augmenter 🦎 → 🐍 A Framework for Task-Sensitive Natural Language Augmentation

OPENALEX - Publications

Kaustubh Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li and 95 more

Data augmentation is an important method for evaluating the robustness of and enhancing diversity training data natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based (NL) framework which supports creation transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 variety NL tasks annotated with noisy descriptive tags. The incorporate noise, intentional accidental human...

10.3384/nejlt.2000-1533.2023.4725 article EN Northern European Journal of Language Technology 2023-04-08

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

OPENALEX - Publications

Sebastian Gehrmann Abhik Bhattacharjee Abinaya Mahendiran Alex Wang Alexandros Papangelis and 72 more

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...

10.18653/v1/2022.emnlp-demos.27 article EN cc-by 2022-01-01

FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers

OPENALEX - Publications

Simon Mille Roberto Carlini Alicia Burga Leo Wanner

We present the contribution of Universitat Pompeu Fabra’s NLP group to SemEval Task 9.2 (AMR-to-English Generation). The proposed generation pipeline comprises: (i) a series rule-based graph-transducers for syntacticization input graphs and resolution morphological agreements, (ii) an off-the-shelf statistical linearization component.

10.18653/v1/s17-2158 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

Perspective-oriented generation of football match summaries

OPENALEX - Publications

Nadjet Bouayad‐Agha Gerard Casamayor Simon Mille Leo Wanner

Team sports commentaries call for techniques that are able to select content and generate wordings reflect the affinity of targeted reader one teams. The existing works tend have in common they either start from knowledge sources limited size whose structures then different ways realization explicitly assigned, or work directly with linguistic corpora, without use a deep source. With increasing availability large-scale ontologies this is no longer satisfactory: needed applicable general...

10.1145/2287710.2287711 article EN ACM Transactions on Speech and Language Processing 2012-07-01

Data-driven sentence generation with non-isomorphic trees

OPENALEX - Publications

Miguel Ballesteros Bernd Bohnet Simon Mille Leo Wanner

Miguel Ballesteros, Bernd Bohnet, Simon Mille, Leo Wanner. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1042 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

Using genre-specific features for patent summaries

OPENALEX - Publications

Joan Codina Nadjet Bouayad‐Agha Alicia Burga Gerard Casamayor Simon Mille and 3 more

10.1016/j.ipm.2016.07.002 article EN Information Processing & Management 2016-07-30

Coming Soon ...