- Topic Modeling
- Natural Language Processing Techniques
- Web Data Mining and Analysis
- Multimodal Machine Learning Applications
- Information Retrieval and Search Behavior
- Speech and dialogue systems
- Text and Document Classification Technologies
- Advanced Text Analysis Techniques
- Climate Change Communication and Perception
- Ecocriticism and Environmental Literature
- Domain Adaptation and Few-Shot Learning
- Climate Change, Adaptation, Migration
University of Michigan
2024
Google (United States)
2023-2024
Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction.Most existing studies focus on designing specialized for particular domain, or they require fine-tuning the LLMs to generate text.We consider typical scenario in which model, generates output, is frozen and can only be accessed through APIs.Under this constraint, all one do improve input (i.e., prompts) sent LLM, procedure that usually done manually.In paper, we propose...
Personalized text generation requires a unique ability of large language models (LLMs) to learn from context that they often do not encounter during their standard training. One way encourage LLMs better use personalized for generating outputs align with the user's expectations is instruct them reason over past preferences, background knowledge, or writing style. To achieve this, we propose Reasoning-Enhanced Self-Training Text Generation (REST-PG), framework trains personal data response...
In dense retrieval, prior work has largely improved retrieval effectiveness using multi-vector representations, exemplified by ColBERT. sparse more recent work, such as SPLADE, demonstrated that one can also learn lexical representations to achieve comparable while enjoying better interpretability. this we combine the strengths of both and for first-stage retrieval. Specifically, propose SparseEmbed - a novel model learns with contextual embeddings. Compared our leverages embeddings improve...
This workshop discusses the cutting-edge developments in research and applications of personalizing large language models (LLMs) adapting them to demands diverse user populations societal needs. The full-day includes several keynotes invited talks, a poster session panel discussion.
Abstract By engaging with and bringing together Chinese environmental humanities science fiction studies, this article argues that the narratives of weather climate revealed in late Qing serve as a metonymic vehicle medium for addressing China's social political crises. The author analyzes three works—Bingshan xuehai 冰山雪海 (Iceberg Snow Ocean), Dianshijie 電世界 (Electrical World), Xinshitouji 新石頭記 (New Story Stone)—and delves into intellectual history modern ideas. In exploring literary...
Google My Business (GMB) is a platform that hosts business profiles, which will be displayed when user issues relevant query on Search or Maps. GMB businesses provide wide variety of services, from home cleaning and repair, to legal consultation. However, the exact details service provided (a.k.a. job types), are often missing in profiles. This places burden finding these users. To alleviate this burden, we built pipeline automatically extract types websites. We share various challenges...
Bag-of-words based lexical retrieval systems are still the most commonly used methods for real-world search applications. Recently deep learning have shown promising results to improve this performance but expensive run in an online fashion, non-trivial integrate into existing production systems, and might not generalize well out-of-domain scenarios. Instead, we build on top of retrievers by proposing a Term Weighting BERT (TW-BERT) model. TW-BERT learns predict weight individual n-gram...
A recent line of work in first-stage Neural Information Retrieval has focused on learning sparse lexical representations instead dense embeddings. One such is SPLADE, which been shown to lead state-of-the-art results both the in-domain and zero-shot settings, can leverage inverted indices for efficient retrieval, offers enhanced interpretability. However, existing SPLADE models are fundamentally limited a representation based native BERT WordPiece vocabulary.