NFDI4DS | UHH-SEMS - Publication Details

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

OPENALEX - Publications

Pavlos Vougiouklis Hady Elsahar Lucie-Aimée Kaffee Christophe Gravier Frédérique Laforest and 2 more

Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this paper, we investigate the problem generating natural language summaries for data using neural networks. Our end-to-end trainable architecture encodes information from a set triples into vector fixed dimensionality and generates summary by conditioning output on encoded vector. We explore different approaches that enable our models verbalise entities input generated text. systems are trained...

10.1016/j.websem.2018.07.002 article EN cc-by Journal of Web Semantics 2018-07-30

A Glimpse into Babel

OPENALEX - Publications

Lucie-Aimée Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena Simperl Leslie Carr and 1 more

Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of international community. Its labels are way humans interact with data. In this paper, we explore state languages in Wikidata as now, regard its ontology, and relationship Wikipedia. Furthermore, set multilinguality context real world by comparing it distribution native speakers. We find existing language maldistribution, which less urgent promising results...

10.1145/3125433.3125465 article EN 2017-08-23

Dataset Reuse: Toward Translating Principles to Practice

OPENALEX - Publications

Laura Koesten Pavlos Vougiouklis Elena Simperl Paul Groth

The web provides access to millions of datasets that can have additional impact when used beyond their original context. We little empirical insight into what makes a dataset more reusable than others and which the existing guidelines frameworks, if any, make difference. In this paper, we explore potential reuse features through literature review present case study on GitHub, popular open platform for sharing code data. describe corpus 1.4 million data files, from over 65,000 repositories....

10.1016/j.patter.2020.100136 article EN cc-by Patterns 2020-11-01

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

OPENALEX - Publications

Lucie-Aimée Kaffee Hady Elsahar Pavlos Vougiouklis Christophe Gravier Frédérique Laforest and 2 more

Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.

10.18653/v1/n18-2101 preprint EN cc-by 2018-01-01

Retrieval Augmented Generation with Rich Answer Encoding

OPENALEX - Publications

Wenyu Huang Mirella Lapata Pavlos Vougiouklis Nikos Papasarantopoulos Jeff Z. Pan

Wenyu Huang, Mirella Lapata, Pavlos Vougiouklis, Nikos Papasarantopoulos, Jeff Pan. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.ijcnlp-main.65 article EN cc-by 2023-01-01

Aligning Texts and Knowledge Bases with Semantic Sentence Simplification

OPENALEX - Publications

Yassine Mrabet Pavlos Vougiouklis Halil Kilicoglu Claire Gardent Dina Demner‐Fushman and 2 more

Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, Elena Simperl. Proceedings of the 2nd International Workshop on Natural Language Generation and Semantic Web (WebNLG 2016). 2016.

10.18653/v1/w16-3506 article EN cc-by 2016-01-01

What do Wikidata and Wikipedia Have in Common?

OPENALEX - Publications

Alessandro Piscopo Pavlos Vougiouklis Lucie-Aimée Kaffee Christopher Phethean Jonathon Hare and 1 more

Wikidata is a community-driven knowledge graph, strongly linked to Wikipedia. However, the connection between two projects has been sporadically explored. We investigated relationship in terms of information they contain by looking at their external references. Our findings show that while only small number sources directly reused across and Wikipedia, references often point same domain. Furthermore, appears use less Anglo-American-centred sources. These results deserve further in-depth...

10.1145/3125433.3125445 article EN 2017-08-23

FastRAT: Fast and Efficient Cross-lingual Text-to-SQL Semantic Parsing

OPENALEX - Publications

Pavlos Vougiouklis Nikos Papasarantopoulos Danna Zheng David Tuckey Chenxin Diao and 2 more

Pavlos Vougiouklis, Nikos Papasarantopoulos, Danna Zheng, David Tuckey, Chenxin Diao, Zhili Shen, Jeff Pan. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.ijcnlp-main.38 article EN cc-by 2023-01-01

Point at the Triple: Generation of Text Summaries from Knowledge Base Triples

OPENALEX - Publications

Pavlos Vougiouklis Eddy Maddalena Jonathon Hare Elena Simperl

We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator network, which, in addition to regular words fixed target vocabulary, able verbalise triples several ways. undertake an automatic and human evaluation single open-domain generation tasks. Both show that our significantly outperforms other data-driven baselines.

10.1613/jair.1.11694 article EN cc-by Journal of Artificial Intelligence Research 2020-09-03

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

OPENALEX - Publications

Pavlos Vougiouklis Hady Elsahar Lucie-Aimée Kaffee Christophe Gravier Frédérique Laforest and 2 more

Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this paper, we investigate the problem generating natural language summaries for data using neural networks. Our end-to-end trainable architecture encodes information from a set triples into vector fixed dimensionality and generates summary by conditioning output on encoded vector. We explore different approaches that enable our models verbalise entities input generated text. systems are trained...

10.2139/ssrn.3248712 article EN SSRN Electronic Journal 2018-01-01

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

OPENALEX - Publications

Pavlos Vougiouklis Hady Elsahar Lucie-Aimée Kaffee Christoph Gravier Frédérique Laforest and 2 more

Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand underlying technology, need textual or visual interfaces help them make sense of it. We explore problem generating natural language summaries for data. This is non-trivial, especially in an open-domain context. To address this problem, we use neural networks. Our system encodes information from a set triples into vector fixed dimensionality and generates summary by conditioning output on...

10.48550/arxiv.1711.00155 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective

OPENALEX - Publications

Lucie-Aimée Kaffee Pavlos Vougiouklis Elena Simperl

Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances machine learning have made it possible train NLG systems that seek achieve human-level performance text writing summarisation. In this paper, we propose such a system the context of Wikipedia evaluate with readers editors. Our solution builds upon ArticlePlaceholder, tool 14 under-resourced versions, which displays structured data Wikidata knowledge...

10.3233/sw-210431 article EN other-oa Semantic Web 2021-04-30

Pie Chart or Pizza: Identifying Chart Types and Their Virality on Twitter

OPENALEX - Publications

Pavlos Vougiouklis Leslie Carr Elena Simperl

We aim to understand how data, rendered visually as charts or infographics, “travels” on social media. To do so we propose a neural network architecture that is trained distinguish among different types of charts, for instance line graphs scatter plots, and predict much they will be shared. This poses significant challenges because the varying format quality are posted, limitations in existing training data. start with, our proposed system outperforms related work chart type classification...

10.1609/icwsm.v14i1.7335 article EN Proceedings of the International AAAI Conference on Web and Social Media 2020-05-26

Point at the Triple: Generation of Text Summaries from Knowledge Base Triples (Extended Abstract)

OPENALEX - Publications

Pavlos Vougiouklis Eddy Maddalena Jonathon Hare Elena Simperl

We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator network, which, in addition to regular words fixed target vocabulary, able verbalise triples several ways. undertake an automatic and human evaluation single open-domain generation tasks. Both show that our significantly outperforms other data-driven baselines.

10.24963/ijcai.2020/711 article EN 2020-07-01

A Usage-centric Take on Intent Understanding in E-Commerce

OPENALEX - Publications

Wendi Zhou Tianyi Li Pavlos Vougiouklis Mark Steedman Jeff Z. Pan

Identifying and understanding user intents is a pivotal task for E-Commerce. Despite its popularity, intent has not been consistently defined or accurately benchmarked. In this paper, we focus on predicative as "how customer uses product", pose natural language reasoning task, independent of product ontologies. We identify two weaknesses FolkScope, the SOTA E-Commerce Intent Knowledge Graph, that limit capacity to reason about recommend diverse useful products. Following these observations,...

10.48550/arxiv.2402.14901 preprint EN arXiv (Cornell University) 2024-02-22

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

OPENALEX - Publications

Wenyu Huang Guancheng Zhou Mirella Lapata Pavlos Vougiouklis Sébastien Montella and 1 more

Although Large Language Models (LLMs) are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related entities). This limitation highlights the need supplement LLMs non-parametric knowledge. To address this issue, we analysed effects of different types including textual passage and knowledge graphs (KGs). Since have probably seen majority factual question-answering...

10.48550/arxiv.2405.06524 preprint EN arXiv (Cornell University) 2024-05-10

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

OPENALEX - Publications

Zhili Shen Pavlos Vougiouklis Chenxin Diao Kaustubh Vyas Yuanyi Ji and 1 more

We focus on Text-to-SQL semantic parsing from the perspective of Large Language Models. Motivated by challenges related to size commercial database schemata and deployability business intelligence solutions, we propose an approach that dynamically retrieves input information uses abstract syntax trees select few-shot examples for in-context learning. Furthermore, investigate extent which in-parallel parser can be leveraged generating $\textit{approximated}$ versions expected SQL queries,...

10.48550/arxiv.2407.03227 preprint EN arXiv (Cornell University) 2024-07-03

Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

OPENALEX - Publications

Wenyu Huang Guancheng Zhou Hongru Wang Pavlos Vougiouklis Mirella Lapata and 1 more

Retrieval-Augmented Generation (RAG) is widely used to inject external non-parametric knowledge into large language models (LLMs). Recent works suggest that Knowledge Graphs (KGs) contain valuable for LLMs. Retrieving information from KGs differs extracting it document sets. Most existing approaches seek directly retrieve relevant subgraphs, thereby eliminating the need extensive SPARQL annotations, traditionally required by semantic parsing methods. In this paper, we model subgraph...

10.48550/arxiv.2410.06121 preprint EN arXiv (Cornell University) 2024-10-08

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

OPENALEX - Publications

Zhili Shen Pavlos Vougiouklis Chenxin Diao Kaustubh Vyas Yuanyi Ji and 1 more

10.18653/v1/2024.emnlp-main.449 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

A Usage-centric Take on Intent Understanding in E-Commerce

OPENALEX - Publications

Wendi Zhou Tianyi Li Pavlos Vougiouklis Mark Steedman Jeff Z. Pan

10.18653/v1/2024.emnlp-main.14 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA

OPENALEX - Publications

Wenyu Huang Guancheng Zhou Hongru Wang Pavlos Vougiouklis Mirella Lapata and 1 more

10.18653/v1/2024.findings-emnlp.927 article EN 2024-01-01