NFDI4DS | UHH-SEMS - Publication Details

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

OPENALEX - Publications

Martin Josifoski Marija Šakota Maxime Peyrard Robert West

Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...

10.18653/v1/2023.emnlp-main.96 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Putting ridesharing to the test: efficient and scalable solutions and the power of dynamic vehicle relocation

OPENALEX - Publications

Panayiotis Danassis Marija Šakota Aris Filos-Ratsikas Boi Faltings

We study the optimization of large-scale, real-time ridesharing systems and propose a modular design methodology, Component Algorithms for Ridesharing (CAR). evaluate diverse set CARs (14 in total), focusing on key algorithmic components ridesharing. take multi-objective approach, evaluating 12 metrics related to global efficiency, complexity, passenger, driver, platform incentives, settings designed closely resemble reality every aspect, vehicles capacity two. To best our knowledge, this is...

10.1007/s10462-022-10145-0 article EN cc-by Artificial Intelligence Review 2022-02-15

Edisum: Summarizing and Explaining Wikipedia Edits at Scale

OPENALEX - Publications

Marija Šakota Isaac Johnson Guosheng Feng Robert West

An edit summary is a succinct comment written by Wikipedia editor explaining the nature of, and reasons for, an to page. Edit summaries are crucial for maintaining encyclopedia: they first thing seen content moderators help them decide whether accept or reject edit. Additionally, constitute valuable data source researchers. Unfortunately, as we show, many edits, either missing incomplete. To overcome this problem editors write useful summaries, propose model recommending generated language...

10.48550/arxiv.2404.03428 preprint EN arXiv (Cornell University) 2024-04-04

Descartes: Generating Short Descriptions of Wikipedia Articles

OPENALEX - Publications

Marija Šakota Maxime Peyrard Robert West

Wikipedia is one of the richest knowledge sources on Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating article's topic (e.g., beer "Alcoholic drink made from fermented cereal grains"). Nonetheless, large fraction (ranging 10.2% in Dutch 99.7% Kazakh) have no yet, detrimental effects for millions users. Motivated by this problem, we introduce...

10.1145/3543507.3583220 article EN Proceedings of the ACM Web Conference 2022 2023-04-26

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

OPENALEX - Publications

Martin Josifoski Marija Šakota Maxime Peyrard Robert West

Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...

10.48550/arxiv.2303.04132 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Descartes: Generating Short Descriptions of Wikipedia Articles

OPENALEX - Publications

Marija Šakota Maxime Peyrard Robert West

Wikipedia is one of the richest knowledge sources on Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating article's topic (e.g., beer "Alcoholic drink made from fermented cereal grains"). Nonetheless, large fraction (ranging 10.2% in Dutch 99.7% Kazakh) have no yet, detrimental effects for millions users. Motivated by this problem, we introduce...

10.48550/arxiv.2205.10012 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01