NFDI4DS | UHH-SEMS - Publication Details

Cheng Li

ORCID: 0000-0003-0678-1357

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020073903

Research Areas

Topic Modeling
Natural Language Processing Techniques
Web Data Mining and Analysis
Multimodal Machine Learning Applications
Information Retrieval and Search Behavior
Speech and dialogue systems
Text and Document Classification Technologies
Advanced Text Analysis Techniques
Climate Change Communication and Perception
Ecocriticism and Environmental Literature
Domain Adaptation and Few-Shot Learning
Climate Change, Adaptation, Migration

University of Michigan
2024

Google (United States)
2023-2024

Learning to Rewrite Prompts for Personalized Text Generation

OPENALEX - Publications

Cheng Li Mingyang Zhang Qiaozhu Mei Weize Kong Michael Bendersky

Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction.Most existing studies focus on designing specialized for particular domain, or they require fine-tuning the LLMs to generate text.We consider typical scenario in which model, generates output, is frozen and can only be accessed through APIs.Under this constraint, all one do improve input (i.e., prompts) sent LLM, procedure that usually done manually.In paper, we propose...

10.1145/3589334.3645408 article EN Proceedings of the ACM Web Conference 2022 2024-05-08

Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation

OPENALEX - Publications

Alireza Salemi Cheng Li Mingyang Zhang Qiaozhu Mei Weize Kong and 4 more

Personalized text generation requires a unique ability of large language models (LLMs) to learn from context that they often do not encounter during their standard training. One way encourage LLMs better use personalized for generating outputs align with the user's expectations is instruct them reason over past preferences, background knowledge, or writing style. To achieve this, we propose Reasoning-Enhanced Self-Training Text Generation (REST-PG), framework trains personal data response...

10.48550/arxiv.2501.04167 preprint EN arXiv (Cornell University) 2025-01-07

SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval

OPENALEX - Publications

Weize Kong Jeffrey M. Dudek Cheng Li Mingyang Zhang Michael Bendersky

In dense retrieval, prior work has largely improved retrieval effectiveness using multi-vector representations, exemplified by ColBERT. sparse more recent work, such as SPLADE, demonstrated that one can also learn lexical representations to achieve comparable while enjoying better interpretability. this we combine the strengths of both and for first-stage retrieval. Specifically, propose SparseEmbed - a novel model learns with contextual embeddings. Compared our leverages embeddings improve...

10.1145/3539618.3592065 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

WSDM 2024 Workshop on Large Language Models for Individuals, Groups, and Society

OPENALEX - Publications

Michael Bendersky Cheng Li Qiaozhu Mei Vanessa Murdock Jie Tang and 3 more

This workshop discusses the cutting-edge developments in research and applications of personalizing large language models (LLMs) adapting them to demands diverse user populations societal needs. The full-day includes several keynotes invited talks, a poster session panel discussion.

10.1145/3616855.3635726 article EN 2024-03-04

The Second Workshop on Large Language Models for Individuals, Groups, and Society

OPENALEX - Publications

Michael Bendersky Cheng Li Qiaozhu Mei Vanessa Murdock Jie Tang and 4 more

10.1145/3626772.3657985 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024-07-10

Inventing Climate Change

OPENALEX - Publications

Cheng Li

Abstract By engaging with and bringing together Chinese environmental humanities science fiction studies, this article argues that the narratives of weather climate revealed in late Qing serve as a metonymic vehicle medium for addressing China's social political crises. The author analyzes three works—Bingshan xuehai 冰山雪海 (Iceberg Snow Ocean), Dianshijie 電世界 (Electrical World), Xinshitouji 新石頭記 (New Story Stone)—and delves into intellectual history modern ideas. In exploring literary...

10.1215/25783491-11206904 article EN Prism 2024-03-01

Job Type Extraction for Service Businesses

OPENALEX - Publications

Cheng Li Yaping Qi Hayk Zakaryan Mingyang Zhang Michael Bendersky and 2 more

Google My Business (GMB) is a platform that hosts business profiles, which will be displayed when user issues relevant query on Search or Maps. GMB businesses provide wide variety of services, from home cleaning and repair, to legal consultation. However, the exact details service provided (a.k.a. job types), are often missing in profiles. This places burden finding these users. To alleviate this burden, we built pipeline automatically extract types websites. We share various challenges...

10.1145/3543873.3584636 article EN 2023-04-28

End-to-End Query Term Weighting

OPENALEX - Publications

Karan Samel Cheng Li Weize Kong Tao Chen Mingyang Zhang and 7 more

Bag-of-words based lexical retrieval systems are still the most commonly used methods for real-world search applications. Recently deep learning have shown promising results to improve this performance but expensive run in an online fashion, non-trivial integrate into existing production systems, and might not generalize well out-of-domain scenarios. Instead, we build on top of retrievers by proposing a Term Weighting BERT (TW-BERT) model. TW-BERT learns predict weight individual n-gram...

10.1145/3580305.3599815 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Learning Sparse Lexical Representations Over Specified Vocabularies for Retrieval

OPENALEX - Publications

Jeffrey M. Dudek Weize Kong Cheng Li Mingyang Zhang Michael Bendersky

A recent line of work in first-stage Neural Information Retrieval has focused on learning sparse lexical representations instead dense embeddings. One such is SPLADE, which been shown to lead state-of-the-art results both the in-domain and zero-shot settings, can leverage inverted indices for efficient retrieval, offers enhanced interpretability. However, existing SPLADE models are fundamentally limited a representation based native BERT WordPiece vocabulary.

10.1145/3583780.3615207 article EN 2023-10-21

Coming Soon ...