Cheng Li

ORCID: 0000-0003-0678-1357
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Web Data Mining and Analysis
  • Multimodal Machine Learning Applications
  • Information Retrieval and Search Behavior
  • Speech and dialogue systems
  • Text and Document Classification Technologies
  • Advanced Text Analysis Techniques
  • Climate Change Communication and Perception
  • Ecocriticism and Environmental Literature
  • Domain Adaptation and Few-Shot Learning
  • Climate Change, Adaptation, Migration

University of Michigan
2024

Google (United States)
2023-2024

Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction.Most existing studies focus on designing specialized for particular domain, or they require fine-tuning the LLMs to generate text.We consider typical scenario in which model, generates output, is frozen and can only be accessed through APIs.Under this constraint, all one do improve input (i.e., prompts) sent LLM, procedure that usually done manually.In paper, we propose...

10.1145/3589334.3645408 article EN Proceedings of the ACM Web Conference 2022 2024-05-08

Personalized text generation requires a unique ability of large language models (LLMs) to learn from context that they often do not encounter during their standard training. One way encourage LLMs better use personalized for generating outputs align with the user's expectations is instruct them reason over past preferences, background knowledge, or writing style. To achieve this, we propose Reasoning-Enhanced Self-Training Text Generation (REST-PG), framework trains personal data response...

10.48550/arxiv.2501.04167 preprint EN arXiv (Cornell University) 2025-01-07

In dense retrieval, prior work has largely improved retrieval effectiveness using multi-vector representations, exemplified by ColBERT. sparse more recent work, such as SPLADE, demonstrated that one can also learn lexical representations to achieve comparable while enjoying better interpretability. this we combine the strengths of both and for first-stage retrieval. Specifically, propose SparseEmbed - a novel model learns with contextual embeddings. Compared our leverages embeddings improve...

10.1145/3539618.3592065 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

This workshop discusses the cutting-edge developments in research and applications of personalizing large language models (LLMs) adapting them to demands diverse user populations societal needs. The full-day includes several keynotes invited talks, a poster session panel discussion.

10.1145/3616855.3635726 article EN 2024-03-04

10.1145/3626772.3657985 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Abstract By engaging with and bringing together Chinese environmental humanities science fiction studies, this article argues that the narratives of weather climate revealed in late Qing serve as a metonymic vehicle medium for addressing China's social political crises. The author analyzes three works—Bingshan xuehai 冰山雪海 (Iceberg Snow Ocean), Dianshijie 電世界 (Electrical World), Xinshitouji 新石頭記 (New Story Stone)—and delves into intellectual history modern ideas. In exploring literary...

10.1215/25783491-11206904 article EN Prism 2024-03-01

Google My Business (GMB) is a platform that hosts business profiles, which will be displayed when user issues relevant query on Search or Maps. GMB businesses provide wide variety of services, from home cleaning and repair, to legal consultation. However, the exact details service provided (a.k.a. job types), are often missing in profiles. This places burden finding these users. To alleviate this burden, we built pipeline automatically extract types websites. We share various challenges...

10.1145/3543873.3584636 article EN 2023-04-28

Bag-of-words based lexical retrieval systems are still the most commonly used methods for real-world search applications. Recently deep learning have shown promising results to improve this performance but expensive run in an online fashion, non-trivial integrate into existing production systems, and might not generalize well out-of-domain scenarios. Instead, we build on top of retrievers by proposing a Term Weighting BERT (TW-BERT) model. TW-BERT learns predict weight individual n-gram...

10.1145/3580305.3599815 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

A recent line of work in first-stage Neural Information Retrieval has focused on learning sparse lexical representations instead dense embeddings. One such is SPLADE, which been shown to lead state-of-the-art results both the in-domain and zero-shot settings, can leverage inverted indices for efficient retrieval, offers enhanced interpretability. However, existing SPLADE models are fundamentally limited a representation based native BERT WordPiece vocabulary.

10.1145/3583780.3615207 article EN 2023-10-21
Coming Soon ...