- Topic Modeling
- Speech and dialogue systems
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Multi-Agent Systems and Negotiation
- AI in Service Interactions
- Hate Speech and Cyberbullying Detection
- Ethics and Social Impacts of AI
- Explainable Artificial Intelligence (XAI)
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Adversarial Robustness in Machine Learning
- Human Pose and Action Recognition
- Land Use and Ecosystem Services
- Social Robot Interaction and HRI
- Decision-Making and Behavioral Economics
- Semantic Web and Ontologies
- Video Analysis and Summarization
- Artificial Intelligence in Healthcare and Education
- Language, Metaphor, and Cognition
- Speech Recognition and Synthesis
- Sexuality, Behavior, and Technology
- Recommender Systems and Techniques
- Domain Adaptation and Few-Shot Learning
- Text Readability and Simplification
DeepMind (United Kingdom)
2024
Google (United Kingdom)
2024
Heriot-Watt University
2013-2023
Bocconi University
2022-2023
Artificial Intelligence in Medicine (Canada)
2023
Meta (Israel)
2022
Heriot-Watt University Malaysia
2012-2022
Edinburgh Napier University
2021
Bar-Ilan University
2021
University of Helsinki
2021
The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent methods: We investigate a wide range including state-of-the-art word-based novel grammar-based ones, demonstrate that they only weakly reflect human judgements system outputs generated by data-driven, end-to-end NLG. also show metric performance is data- system-specific. Nevertheless, our results suggest metrics perform reliably at...
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in restaurant domain, which is ten times bigger than existing, frequently used datasets this area. The poses challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from set requires content selection. As such, learning promises natural, varied less template-like system utterances. We also...
This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based results. aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity diverse discourse phenomena. Introducing novel automatic human metrics, we compare 62 submitted 17 institutions, covering wide range approaches,...
David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser. Proceedings of the 13th International Conference on Natural Language Generation. 2020.
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new package containing the following: (1) A challenging dataset English spanning 18 domains, which is substantially bigger linguistically more diverse than existing datasets; (2) Competitive baselines based on state-of-the-art NLU ASR...
This paper summarises the experimental setup and results of first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent systems are promising since they reduce need for data annotation. However, currently limited to small, delexicalised datasets. The E2E NLG aims assess whether these novel approaches can generate better-quality output by learning from a dataset containing higher lexical richness, syntactic complexity diverse discourse phenomena....
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible lay developer. In paper, we present first wide coverage evaluation and comparison some most popular NLU services, on a large, multi-domain (21 domains) dataset 25K that collected annotated with Entity Type specifications will be released as...
Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show impact of semantic noise on state-of-the-art NNLG models implement different control mechanisms. We find that cleaned data can improve correctness by up 97%, while maintaining fluency. also most common error omitting information, rather than hallucination.
Elisa Leonardelli, Gavin Abercrombie, Dina Almanea, Valerio Basile, Tommaso Fornaciari, Barbara Plank, Verena Rieser, Alexandra Uma, Massimo Poesio. Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023). 2023.
This paper focuses on the opportunities and ethical societal risks posed by advanced AI assistants. We define assistants as artificial agents with natural language interfaces, whose function is to plan execute sequences of actions behalf a user, across one or more domains, in line user's expectations. The starts considering technology itself, providing an overview assistants, their technical foundations potential range applications. It then explores questions around value alignment,...
Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem individual preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines use continuous scales and relative assessments. RankME significantly improves reliability consistency ratings compared traditional methods. In...
Conversational AI systems, such as Amazon's Alexa, are rapidly developing from purely transactional systems to social chatbots, which can respond a wide variety of user requests. In this article, we establish how current state-of-the-art conversational react inappropriate requests, bullying and sexual harassment on the part user, by collecting analysing novel #MeTooAlexa corpus. Our results show that commercial mainly avoid answering, while rule-based chatbots behaviours often deflect....
We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) introduce a measure of as the GloVe embedding similarity between dialogue context generated response, (2) we filter our training corpora based on obtain topically coherent lexically diverse context-response pairs, (3) then train response generator using conditional variational autoencoder model that incorporates...
Visual Dialogue involves “understanding” the dialogue history (what has been discussed previously) and current question is asked), in addition to grounding information image, accurately generate correct response. In this paper, we show that co-attention models which explicitly encode dialoh outperform don’t, achieving state-of-the-art performance (72 % NDCG on val set). However, also expose shortcomings of crowdsourcing dataset collection procedure, by showing indeed only required for a...
Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting meaning representations (MRs) paired with (NL) utterances.In this work, we propose a novel framework for crowdsourcing high quality NLG using automatic control measures and evaluating different MRs which to elicit data.We show that pictorial result better NL data collected than logicbased MRs: utterances elicited by are...
Decision-making is often dependent on uncertain data, e.g.data associated with confidence scores or probabilities.We present a comparison of different information presentations for data and, the first time, measure their effects human decision-making.We show that use Natural Language Generation (NLG) improves decision-making under uncertainty, compared to state-of-theart graphical-based representation methods.In task-based study 442 adults, we found using NLG lead 24% better average than...
Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans. However, these models are often trained on large datasets from internet, and as result, may learn undesirable behaviors this data, such toxic or otherwise harmful language. Researchers must thus wrestle issue of how when release models. In paper, we survey problem landscape for safety AI discuss recent related work. We highlight tensions...