- Natural Language Processing Techniques
- Wikis in Education and Collaboration
- Topic Modeling
- Speech Recognition and Synthesis
- Sharing Economy and Platforms
- Transportation and Mobility Innovations
- Urban and Freight Transport Logistics
- Smart Parking Systems Research
- Cancer-related gene regulation
École Polytechnique Fédérale de Lausanne
2022-2023
Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...
We study the optimization of large-scale, real-time ridesharing systems and propose a modular design methodology, Component Algorithms for Ridesharing (CAR). evaluate diverse set CARs (14 in total), focusing on key algorithmic components ridesharing. take multi-objective approach, evaluating 12 metrics related to global efficiency, complexity, passenger, driver, platform incentives, settings designed closely resemble reality every aspect, vehicles capacity two. To best our knowledge, this is...
An edit summary is a succinct comment written by Wikipedia editor explaining the nature of, and reasons for, an to page. Edit summaries are crucial for maintaining encyclopedia: they first thing seen content moderators help them decide whether accept or reject edit. Additionally, constitute valuable data source researchers. Unfortunately, as we show, many edits, either missing incomplete. To overcome this problem editors write useful summaries, propose model recommending generated language...
Wikipedia is one of the richest knowledge sources on Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating article's topic (e.g., beer "Alcoholic drink made from fermented cereal grains"). Nonetheless, large fraction (ranging 10.2% in Dutch 99.7% Kazakh) have no yet, detrimental effects for millions users. Motivated by this problem, we introduce...
Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful can be synthetically generated even tasks cannot solved directly by LLMs: problems with structured outputs, it is possible to prompt an LLM perform the task in reverse direction, generating plausible input text a target output structure. Leveraging this asymmetry difficulty makes produce large-scale, high-quality complex tasks. We demonstrate effectiveness of approach on closed...
Wikipedia is one of the richest knowledge sources on Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating article's topic (e.g., beer "Alcoholic drink made from fermented cereal grains"). Nonetheless, large fraction (ranging 10.2% in Dutch 99.7% Kazakh) have no yet, detrimental effects for millions users. Motivated by this problem, we introduce...