Simon Mille

ORCID: 0000-0002-8852-2764
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Semantic Web and Ontologies
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Software Engineering Research
  • Data Management and Algorithms
  • Advanced Text Analysis Techniques
  • linguistics and terminology studies
  • Multimodal Machine Learning Applications
  • Biomedical Text Mining and Ontologies
  • Data Quality and Management
  • Intellectual Property and Patents
  • Handwritten Text Recognition Techniques
  • 3D Surveying and Cultural Heritage
  • Geographic Information Systems Studies
  • Data Visualization and Analytics
  • Advanced Database Systems and Queries
  • Delphi Technique in Research
  • Translation Studies and Practices
  • Artificial Intelligence in Law
  • Cybercrime and Law Enforcement Studies
  • Information and Cyber Security
  • Spam and Phishing Detection
  • Mathematics, Computing, and Information Processing

Dublin City University
2023

Universitat Pompeu Fabra
2013-2022

FC Barcelona
2017-2020

University of Brighton
2018-2019

Institució Catalana de Recerca i Estudis Avançats
2018-2019

University of Coimbra
2017

Thomson Reuters (United States)
2017

University Press of Florida
2017

Bridge University
2017

University of Cambridge
2017

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa...

10.18653/v1/2021.gem-1.10 preprint ID cc-by 2021-01-01

David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser. Proceedings of the 13th International Conference on Natural Language Generation. 2020.

10.18653/v1/2020.inlg-1.23 article EN cc-by 2020-01-01

The way in which a text is written can be barrier for many people. Automatic simplification natural language processing technology that, when mature, could used to produce texts that are adapted the specific needs of particular users. Most research area automatic has dealt with English language. In this article, we present results from Simplext project, dedicated Spanish. We modular system procedures syntactic and lexical grounded on analysis corpus manually simplified people special needs....

10.1145/2738046 article EN ACM Transactions on Accessible Computing 2015-05-11

We report results from the SR'19 Shared Task, second edition of a multilingual surface realisation task organised as part EMNLP'19 Workshop on Multilingual Surface Realisation. As in SR'18, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered eleven, three languages. Systems evaluated...

10.18653/v1/d19-6301 article EN cc-by 2019-01-01

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on constantly evolving ecosystem of automated metrics, datasets, human evaluation standards. Due to this moving target, new models often still evaluate divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging identify the limitations current opportunities progress. Addressing limitation, GEM provides...

10.48550/arxiv.2102.01672 preprint EN cc-by arXiv (Cornell University) 2021-01-01

We report results from the SR'18 Shared Task, a new multilingual surface realisation task organised as part of ACL'18 Workshop on Multilingual Surface Realisation. As in its English-only predecessor SR'11, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered ten, three languages. Systems evaluated...

10.18653/v1/w18-3601 article EN cc-by 2018-01-01

Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which are comparable can be expected to yield similar results when applied the same system outputs. This has serious implications reproducibility testing meta-evaluation, particular given that evaluation considered gold standard against trustworthiness of automatic metrics gauged. %and merging others, as well deciding should able reproduce each other’s results. Using examples from NLG, we...

10.18653/v1/2020.inlg-1.24 article EN cc-by 2020-01-01
Kaustubh Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li and 95 more Saad Mahamood Abinaya Mahendiran Simon Mille Ashish Shrivastava Samson Tan Tongshuang Wu Jascha Sohl‐Dickstein Jinho D. Choi Eduard Hovy Ondřej Dušek Sebastian Ruder Sajant Anand Nagender Aneja Rabin Banjade Lisa Barthe Hanna Behnke Ian Berlot-Attwell Connor Boyle Caroline Brun Marco Antonio Sobrevilla Cabezudo Samuel Cahyawijaya Émile Chapuis Wanxiang Che Mukund Choudhary Christian Clauss Pierre Colombo Filip Cornell Gautier Dagan Mayukh Das Tanay Dixit Thomas Dopierre Paul-Alexis Dray Suchitra Dubey Tatiana Ekeinhor Marco Di Giovanni Tanya Goyal Rishabh Gupta Rishabh Gupta Louanes Hamla Sang Wook Han Fabrice Harel-Canada Antoine Honoré Ishan Jindal Przemyslaw K. Joniak Denis Kleyko Venelin Kovatchev Kalpesh Krishna Ashutosh Kumar Stefan Langer Seungjae Ryan Lee Corey James Levinson Hualou Liang Kaizhao Liang Zhexiong Liu Andrey Lukyanenko Vukosi Marivate Gerard de Melo Simon Méoni Maxime Meyer Afnan Mir Nafise Sadat Moosavi Niklas Muennighoff Timothy Sum Hon Mun Kenton Murray Marcin Namysł Maria Obedkova Priti Oli Nivranshu Pasricha Jan Pfister Richard Plant Vinay Prabhu Vasile Păiș Libo Qin Shahab Raji Pawan Kumar Rajpoot Vikas Raunak Roy Rinberg Nicolas Roberts Juan Diego Rodríguez Claude Roux P. H. S. Vasconcellos Ananya B. Sai Robin M. Schmidt Thomas Scialom Tshephisho Joseph Sefara Saqib Shamsi Xudong Shen Haoyue Shi Yiwen Shi Анна Швец Nick Siegel Damien Sileo Jamie Simon Chandan Singh Roman Sitelew

Data augmentation is an important component in the robustness evaluation of models natural language processing (NLP) and enhancing diversity data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based framework which supports creation both transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 for variety tasks. demonstrate efficacy NL-Augmenter by using several its analyze popular...

10.48550/arxiv.2112.02721 preprint EN cc-by arXiv (Cornell University) 2021-01-01

This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts definitions from metrology. QRA produces single score estimating the degree of given system evaluation measure, basis scores from, differences between, different reproductions. We test 18 measure combinations (involving diverse NLP tasks types evaluation), each which we have original results one to seven reproduction results. The proposed degree-of-reproducibility...

10.18653/v1/2022.acl-long.2 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01
Kaustubh Dhole Varun Gangal Sebastian Gehrmann Aadesh Gupta Zhenhao Li and 95 more Saad Mahamood Abinaya Mahadiran Simon Mille Ashish Shrivastava Samson Tan Tongshang Wu Jascha Sohl‐Dickstein Jinho Choi Eduard Hovy Ondřej Dušek Sebastian Ruder Sajant Anand Nagender Aneja Rabin Banjade Lisa Barthe Hanna Behnke Ian Berlot-Attwell Connor Boyle Caroline Brun Marco Antonio Sobrevilla Cabezudo Samuel Cahyawijaya Émile Chapuis Wanxiang Che Mukund Choudhary Christian Clauss Pierre Colombo Filip Cornell Gautier Dagan Mayukh Das Tanay Dixit Thomas Dopierre Paul-Alexis Dray Suchitra Dubey Tatiana Ekeinhor Marco Di Giovanni Tanya Goyal Rishabh Gupta Louanes Hamla Sang Wook Han Fabrice Harel-Canada Antoine Honoré Ishan Jindal Przemysław Joniak Denis Kleyko Venelin Kovatchev Kalpesh Krishna Ashutosh Kumar Stefan Langer Seungjae Ryan Lee Corey James Levinson Hualou Liang Kaizhao Liang Zhexiong Liu Andrey Lukyanenko Vukosi Marivate Gerard de Melo Simon Méoni Maxine Meyer Afnan Mir Nafise Sadat Moosavi Niklas Meunnighoff Timothy Sum Hon Mun Kenton Murray Marcin Namysł Maria Obedkova Priti Oli Nivranshu Pasricha Jan Pfister Richard E. Plant Vinay Prabhu Vasile Păiș Libo Qin Shahab Raji Pawan Kumar Rajpoot Vikas Raunak Roy Rinberg Nicholas J. Roberts Juan Diego Rodríguez Claude Roux Vasconcellos Samus Ananya B. Sai Robin Schmidt Thomas Scialom Tshephisho Joseph Sefara Saqib Shamsi Xudong Shen Yiwen Shi Haoyue Shi Анна Швец Nick Siegel Damien Sileo Jamie Simon Chandan Singh Roman Sitelew Priyank Soni

Data augmentation is an important method for evaluating the robustness of and enhancing diversity training data natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based (NL) framework which supports creation transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 variety NL tasks annotated with noisy descriptive tags. The incorporate noise, intentional accidental human...

10.3384/nejlt.2000-1533.2023.4725 article EN Northern European Journal of Language Technology 2023-04-08

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...

10.18653/v1/2022.emnlp-demos.27 article EN cc-by 2022-01-01

We present the contribution of Universitat Pompeu Fabra’s NLP group to SemEval Task 9.2 (AMR-to-English Generation). The proposed generation pipeline comprises: (i) a series rule-based graph-transducers for syntacticization input graphs and resolution morphological agreements, (ii) an off-the-shelf statistical linearization component.

10.18653/v1/s17-2158 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

Team sports commentaries call for techniques that are able to select content and generate wordings reflect the affinity of targeted reader one teams. The existing works tend have in common they either start from knowledge sources limited size whose structures then different ways realization explicitly assigned, or work directly with linguistic corpora, without use a deep source. With increasing availability large-scale ontologies this is no longer satisfactory: needed applicable general...

10.1145/2287710.2287711 article EN ACM Transactions on Speech and Language Processing 2012-07-01

Miguel Ballesteros, Bernd Bohnet, Simon Mille, Leo Wanner. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1042 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01
Coming Soon ...