- Topic Modeling
- Natural Language Processing Techniques
- Wikis in Education and Collaboration
- Multimodal Machine Learning Applications
- Semantic Web and Ontologies
- Speech and dialogue systems
- Data Mining Algorithms and Applications
- Big Data Technologies and Applications
- Advanced Text Analysis Techniques
- Scientific Computing and Data Management
- Algorithms and Data Compression
- Advanced Database Systems and Queries
- Business Process Modeling and Analysis
- Data Quality and Management
- Online and Blended Learning
- Speech Recognition and Synthesis
- Advanced Image and Video Retrieval Techniques
- Text Readability and Simplification
- Research Data Management Practices
- Misinformation and Its Impacts
- Open Education and E-Learning
- Educational Technology and Assessment
- Sentiment Analysis and Opinion Mining
- Advanced Data Storage Technologies
- Intelligent Tutoring Systems and Adaptive Learning
Huawei Technologies (United Kingdom)
2020-2023
Huawei Technologies (China)
2021
University of Southampton
2016-2020
Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this paper, we investigate the problem generating natural language summaries for data using neural networks. Our end-to-end trainable architecture encodes information from a set triples into vector fixed dimensionality and generates summary by conditioning output on encoded vector. We explore different approaches that enable our models verbalise entities input generated text. systems are trained...
Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of international community. Its labels are way humans interact with data. In this paper, we explore state languages in Wikidata as now, regard its ontology, and relationship Wikipedia. Furthermore, set multilinguality context real world by comparing it distribution native speakers. We find existing language maldistribution, which less urgent promising results...
The web provides access to millions of datasets that can have additional impact when used beyond their original context. We little empirical insight into what makes a dataset more reusable than others and which the existing guidelines frameworks, if any, make difference. In this paper, we explore potential reuse features through literature review present case study on GitHub, popular open platform for sharing code data. describe corpus 1.4 million data files, from over 65,000 repositories....
Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.
Wenyu Huang, Mirella Lapata, Pavlos Vougiouklis, Nikos Papasarantopoulos, Jeff Pan. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.
Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, Elena Simperl. Proceedings of the 2nd International Workshop on Natural Language Generation and Semantic Web (WebNLG 2016). 2016.
Wikidata is a community-driven knowledge graph, strongly linked to Wikipedia. However, the connection between two projects has been sporadically explored. We investigated relationship in terms of information they contain by looking at their external references. Our findings show that while only small number sources directly reused across and Wikipedia, references often point same domain. Furthermore, appears use less Anglo-American-centred sources. These results deserve further in-depth...
Pavlos Vougiouklis, Nikos Papasarantopoulos, Danna Zheng, David Tuckey, Chenxin Diao, Zhili Shen, Jeff Pan. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.
We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator network, which, in addition to regular words fixed target vocabulary, able verbalise triples several ways. undertake an automatic and human evaluation single open-domain generation tasks. Both show that our significantly outperforms other data-driven baselines.
Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this paper, we investigate the problem generating natural language summaries for data using neural networks. Our end-to-end trainable architecture encodes information from a set triples into vector fixed dimensionality and generates summary by conditioning output on encoded vector. We explore different approaches that enable our models verbalise entities input generated text. systems are trained...
Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand underlying technology, need textual or visual interfaces help them make sense of it. We explore problem generating natural language summaries for data. This is non-trivial, especially in an open-domain context. To address this problem, we use neural networks. Our system encodes information from a set triples into vector fixed dimensionality and generates summary by conditioning output on...
Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances machine learning have made it possible train NLG systems that seek achieve human-level performance text writing summarisation. In this paper, we propose such a system the context of Wikipedia evaluate with readers editors. Our solution builds upon ArticlePlaceholder, tool 14 under-resourced versions, which displays structured data Wikidata knowledge...
We aim to understand how data, rendered visually as charts or infographics, “travels” on social media. To do so we propose a neural network architecture that is trained distinguish among different types of charts, for instance line graphs scatter plots, and predict much they will be shared. This poses significant challenges because the varying format quality are posted, limitations in existing training data. start with, our proposed system outperforms related work chart type classification...
We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator network, which, in addition to regular words fixed target vocabulary, able verbalise triples several ways. undertake an automatic and human evaluation single open-domain generation tasks. Both show that our significantly outperforms other data-driven baselines.
Identifying and understanding user intents is a pivotal task for E-Commerce. Despite its popularity, intent has not been consistently defined or accurately benchmarked. In this paper, we focus on predicative as "how customer uses product", pose natural language reasoning task, independent of product ontologies. We identify two weaknesses FolkScope, the SOTA E-Commerce Intent Knowledge Graph, that limit capacity to reason about recommend diverse useful products. Following these observations,...
Although Large Language Models (LLMs) are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related entities). This limitation highlights the need supplement LLMs non-parametric knowledge. To address this issue, we analysed effects of different types including textual passage and knowledge graphs (KGs). Since have probably seen majority factual question-answering...
We focus on Text-to-SQL semantic parsing from the perspective of Large Language Models. Motivated by challenges related to size commercial database schemata and deployability business intelligence solutions, we propose an approach that dynamically retrieves input information uses abstract syntax trees select few-shot examples for in-context learning. Furthermore, investigate extent which in-parallel parser can be leveraged generating $\textit{approximated}$ versions expected SQL queries,...
Retrieval-Augmented Generation (RAG) is widely used to inject external non-parametric knowledge into large language models (LLMs). Recent works suggest that Knowledge Graphs (KGs) contain valuable for LLMs. Retrieving information from KGs differs extracting it document sets. Most existing approaches seek directly retrieve relevant subgraphs, thereby eliminating the need extensive SPARQL annotations, traditionally required by semantic parsing methods. In this paper, we model subgraph...