- Natural Language Processing Techniques
- Topic Modeling
- Semantic Web and Ontologies
- Speech and dialogue systems
- Text Readability and Simplification
- Software Engineering Research
- Data Management and Algorithms
- Advanced Text Analysis Techniques
- linguistics and terminology studies
- Multimodal Machine Learning Applications
- Biomedical Text Mining and Ontologies
- Data Quality and Management
- Intellectual Property and Patents
- Handwritten Text Recognition Techniques
- 3D Surveying and Cultural Heritage
- Geographic Information Systems Studies
- Data Visualization and Analytics
- Advanced Database Systems and Queries
- Delphi Technique in Research
- Translation Studies and Practices
- Artificial Intelligence in Law
- Cybercrime and Law Enforcement Studies
- Information and Cyber Security
- Spam and Phishing Detection
- Mathematics, Computing, and Information Processing
Dublin City University
2023
Universitat Pompeu Fabra
2013-2022
FC Barcelona
2017-2020
University of Brighton
2018-2019
Institució Catalana de Recerca i Estudis Avançats
2018-2019
University of Coimbra
2017
Thomson Reuters (United States)
2017
University Press of Florida
2017
Bridge University
2017
University of Cambridge
2017
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa...
David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser. Proceedings of the 13th International Conference on Natural Language Generation. 2020.
The way in which a text is written can be barrier for many people. Automatic simplification natural language processing technology that, when mature, could used to produce texts that are adapted the specific needs of particular users. Most research area automatic has dealt with English language. In this article, we present results from Simplext project, dedicated Spanish. We modular system procedures syntactic and lexical grounded on analysis corpus manually simplified people special needs....
We report results from the SR'19 Shared Task, second edition of a multilingual surface realisation task organised as part EMNLP'19 Workshop on Multilingual Surface Realisation. As in SR'18, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered eleven, three languages. Systems evaluated...
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on constantly evolving ecosystem of automated metrics, datasets, human evaluation standards. Due to this moving target, new models often still evaluate divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging identify the limitations current opportunities progress. Addressing limitation, GEM provides...
We report results from the SR'18 Shared Task, a new multilingual surface realisation task organised as part of ACL'18 Workshop on Multilingual Surface Realisation. As in its English-only predecessor SR'11, shared comprised two tracks with different levels complexity: (a) shallow track where inputs were full UD structures word order information removed and tokens lemmatised; (b) deep additionally, functional words morphological removed. The was offered ten, three languages. Systems evaluated...
Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which are comparable can be expected to yield similar results when applied the same system outputs. This has serious implications reproducibility testing meta-evaluation, particular given that evaluation considered gold standard against trustworthiness of automatic metrics gauged. %and merging others, as well deciding should able reproduce each other’s results. Using examples from NLG, we...
Data augmentation is an important component in the robustness evaluation of models natural language processing (NLP) and enhancing diversity data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based framework which supports creation both transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 for variety tasks. demonstrate efficacy NL-Augmenter by using several its analyze popular...
This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts definitions from metrology. QRA produces single score estimating the degree of given system evaluation measure, basis scores from, differences between, different reproductions. We test 18 measure combinations (involving diverse NLP tasks types evaluation), each which we have original results one to seven reproduction results. The proposed degree-of-reproducibility...
Data augmentation is an important method for evaluating the robustness of and enhancing diversity training data natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based (NL) framework which supports creation transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 variety NL tasks annotated with noisy descriptive tags. The incorporate noise, intentional accidental human...
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...
We present the contribution of Universitat Pompeu Fabra’s NLP group to SemEval Task 9.2 (AMR-to-English Generation). The proposed generation pipeline comprises: (i) a series rule-based graph-transducers for syntacticization input graphs and resolution morphological agreements, (ii) an off-the-shelf statistical linearization component.
Team sports commentaries call for techniques that are able to select content and generate wordings reflect the affinity of targeted reader one teams. The existing works tend have in common they either start from knowledge sources limited size whose structures then different ways realization explicitly assigned, or work directly with linguistic corpora, without use a deep source. With increasing availability large-scale ontologies this is no longer satisfactory: needed applicable general...
Miguel Ballesteros, Bernd Bohnet, Simon Mille, Leo Wanner. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.