NFDI4DS | UHH-SEMS - Publication Details

Fengran Mo

ORCID: 0000-0002-0838-6994

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5048033573

Research Areas

Topic Modeling
Natural Language Processing Techniques
Speech and dialogue systems
Multimodal Machine Learning Applications
Advanced Text Analysis Techniques
Recommender Systems and Techniques
Domain Adaptation and Few-Shot Learning
Privacy-Preserving Technologies in Data
Semantic Web and Ontologies
Neural Networks and Applications
Text and Document Classification Technologies
Millimeter-Wave Propagation and Modeling
Advanced Database Systems and Queries
Distributed and Parallel Computing Systems
Evacuation and Crowd Dynamics
Digital Filter Design and Implementation
Advanced Graph Neural Networks
Biomedical Text Mining and Ontologies
Human Mobility and Location-Based Analysis
Mathematics, Computing, and Information Processing
Cryptography and Data Security
Speech and Audio Processing
AI in Service Interactions
Advanced Image and Video Retrieval Techniques
Scientific Computing and Data Management

Université de Montréal
2021-2024

Dalian University of Technology
2021

Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation

OPENALEX - Publications

Tianyu Zhu Yansong Shi Yuan Zhang Yihong Wu Fengran Mo and 1 more

Modern recommender systems employ various sequential modules such as self-attention to learn dynamic user interests. However, these methods are less effective in capturing collaborative and transitional signals within interaction sequences. First, the architecture uses embedding of a single item attention query, making it challenging capture signals. Second, typically follow an auto-regressive framework, which is unable global transition patterns. To overcome limitations, we propose new...

10.1145/3616855.3635787 preprint EN 2024-03-04

Can Large Language Models Understand Preferences in Personalized Recommendation?

OPENALEX - Publications

Zhaoxuan Tan Zhi Zeng Qingkai Zeng Zhenyu Wu Zheyuan Liu and 2 more

Large Language Models (LLMs) excel in various tasks, including personalized recommendations. Existing evaluation methods often focus on rating prediction, relying regression errors between actual and predicted ratings. However, user bias item quality, two influential factors behind scores, can obscure personal preferences user-item pair data. To address this, we introduce PerRecBench, disassociating the from these assessing recommendation techniques capturing a grouped ranking manner. We...

10.48550/arxiv.2501.13391 preprint EN arXiv (Cornell University) 2025-01-23

LEKA:LLM-Enhanced Knowledge Augmentation

OPENALEX - Publications

Xinhao Zhang Jinghan Zhang Fengran Mo Dongjie Wang Yanjie Fu and 1 more

Humans excel in analogical learning and knowledge transfer and, more importantly, possess a unique understanding of identifying appropriate sources knowledge. From model's perspective, this presents an interesting challenge. If models could autonomously retrieve useful for or decision-making to solve problems, they would transition from passively acquiring actively accessing However, filling with is relatively straightforward -- it simply requires training accessible bases. The complex task...

10.48550/arxiv.2501.17802 preprint EN arXiv (Cornell University) 2025-01-29

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

OPENALEX - Publications

Yi Lü Jie Peng Zheng Yan-ping Fengran Mo Zhewei Wei and 3 more

Future link prediction is a fundamental challenge in various real-world dynamic systems. To address this, numerous temporal graph neural networks (temporal GNNs) and benchmark datasets have been developed. However, these often feature excessive repeated edges lack complex sequential dynamics, key characteristic inherent many applications such as recommender systems ``Who-To-Follow'' on social networks. This oversight has led existing methods to inadvertently downplay the importance of...

10.48550/arxiv.2502.02975 preprint EN arXiv (Cornell University) 2025-02-05

A Survey of Model Architectures in Information Retrieval

OPENALEX - Publications

Zhichao Xu Fengran Mo Zhiqi Huang Crystina Zhang Puxuan Yu and 3 more

This survey examines the evolution of model architectures in information retrieval (IR), focusing on two key aspects: backbone models for feature extraction and end-to-end system relevance estimation. The review intentionally separates architectural considerations from training methodologies to provide a focused analysis structural innovations IR systems.We trace development traditional term-based methods modern neural approaches, particularly highlighting impact transformer-based subsequent...

10.48550/arxiv.2502.14822 preprint EN arXiv (Cornell University) 2025-02-20

ConvGQR: Generative Query Reformulation for Conversational Search

OPENALEX - Publications

Fengran Mo Kelong Mao Yutao Zhu Yihong Wu Kaiyu Huang and 1 more

In conversational search, the user’s real search intent for current conversation turn is dependent on previous history. It challenging to determine a good query from whole context. To avoid expensive re-training of encoder, most existing methods try learn rewriting model de-contextualize by mimicking manual rewriting.However, manually rewritten queries are not always best queries.Thus, training them would lead sub-optimal queries. Another useful information enhance potential answer question....

10.18653/v1/2023.acl-long.274 article EN cc-by 2023-01-01

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

OPENALEX - Publications

Kelong Mao Zhicheng Dou Fengran Mo Jiewen Hou Haonan Chen and 1 more

Precisely understanding users' contextual search intent has been an important challenge for conversational search. As sessions are much more diverse and long-tailed, existing methods trained on limited data still show unsatisfactory effectiveness robustness to handle real scenarios. Recently, large language models (LLMs) have demonstrated amazing capabilities text generation conversation understanding. In this work, we present a simple yet effective prompting framework, called LLM4CS,...

10.18653/v1/2023.findings-emnlp.86 article EN cc-by 2023-01-01

A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models

OPENALEX - Publications

Jiayin Wang Fengran Mo Weizhi Ma Peijie Sun Min Zhang and 1 more

10.18653/v1/2024.emnlp-main.210 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

Learning Denoised and Interpretable Session Representation for Conversational Search

OPENALEX - Publications

Kelong Mao Hongjin Qian Fengran Mo Zhicheng Dou Bang Liu and 2 more

Conversational search supports multi-turn user-system interactions to solve complex information needs. Compared with the traditional single-turn ad-hoc search, conversational faces a more intent understanding problem because session is much longer and contains many noisy tokens. However, existing dense retrieval solutions simply fine-tune pre-trained query encoder on limited data, which are hard achieve satisfactory performance in such scenario. Meanwhile, learned latent representation also...

10.1145/3543507.3583265 article EN Proceedings of the ACM Web Conference 2022 2023-04-26

A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation

OPENALEX - Publications

Kaiyu Huang Degen Huang Zhuang Liu Fengran Mo

Word-level information is important in natural language processing (NLP), especially for the Chinese due to its high linguistic complexity. word segmentation (CWS) an essential task downstream NLP tasks. Existing methods have already achieved a competitive performance CWS on large-scale annotated corpora. However, accuracy of method will drop dramatically when it handles unsegmented text with lots out-of-vocabulary (OOV) words. In addition, there are many different criteria addressing...

10.18653/v1/2020.emnlp-main.318 article EN 2020-01-01

Unifying Graph Convolution and Contrastive Learning in Collaborative Filtering

OPENALEX - Publications

Yihong Wu L. M. Zhang Fengran Mo Tianyu Zhu Weizhi Ma and 1 more

Graph-based models and contrastive learning have emerged as prominent methods in Collaborative Filtering (CF). While many existing CF incorporate these their design, there seems to be a limited depth of analysis regarding the foundational principles behind them. This paper bridges graph convolution, pivotal element graph-based models, with through theoretical framework. By examining dynamics equilibrium loss, we offer fresh lens understand via theory, emphasizing its capability capture...

10.1145/3637528.3671840 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

ConvTrans: Transforming Web Search Sessions for Conversational Dense Retrieval

OPENALEX - Publications

Kelong Mao Zhicheng Dou Hongjin Qian Fengran Mo Xiaohua Cheng and 1 more

Conversational search provides users with a natural and convenient new experience. Recently, conversational dense retrieval has shown to be promising technique for realizing search. However, as systems have not been widely deployed, it is hard get large-scale real sessions relevance labels support the training of retrieval. To tackle this data scarcity problem, previous methods focus on developing better few-shot learning approaches or generating pseudo labels, but they use still heavily...

10.18653/v1/2022.emnlp-main.190 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

Learning to Relate to Previous Turns in Conversational Search

OPENALEX - Publications

Fengran Mo Jian‐Yun Nie Kaiyu Huang Kelong Mao Yutao Zhu and 2 more

Conversational search allows a user to interact with system in multiple turns. A query is strongly dependent on the conversation context. An effective way improve retrieval effectiveness expand current historical queries. However, not all previous queries are related to, and useful for expanding query. In this paper, we propose new method select relevant that To cope lack of labeled training data, use pseudo-labeling approach annotate based their impact results. The pseudo-labeled data used...

10.1145/3580305.3599411 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

A Customized Text Sanitization Mechanism with Differential Privacy

OPENALEX - Publications

Sai Chen Fengran Mo Yanhao Wang Cen Chen Jian‐Yun Nie and 2 more

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject differential privacy. However, state-of-the-art text sanitization mechanisms based on a relaxed notion of metric local (MLDP) do not apply non-metric semantic similarity measures and cannot achieve good privacy-utility trade-offs. To address these limitations, we propose novel Customized Text (CusText) mechanism original...

10.18653/v1/2023.findings-acl.355 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

Search-Oriented Conversational Query Editing

OPENALEX - Publications

Kelong Mao Zhicheng Dou Bang Liu Hongjin Qian Fengran Mo and 3 more

Conversational query rewriting (CQR) realizes conversational search by reformulating the dialogue into a standalone rewrite. However, existing CQR models either are not learned toward improving downstream performance or inefficiently generate rewrite token-by-token from scratch while neglecting fact that often has large overlap with In this paper, we propose EdiRCS, new text editing-based model tailored for search. most of tokens selected in non-autoregressive fashion and only few generated...

10.18653/v1/2023.findings-acl.256 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

A User-Centric Benchmark for Evaluating Large Language Models

OPENALEX - Publications

Jiayin Wang Fengran Mo Weizhi Ma Peijie Sun Min Zhang and 1 more

Large Language Models (LLMs) are essential tools to collaborate with users on different tasks. Evaluating their performance serve users' needs in real-world scenarios is important. While many benchmarks have been created, they mainly focus specific predefined model abilities. Few covered the intended utilization of LLMs by real users. To address this oversight, we propose benchmarking from a user perspective both dataset construction and evaluation designs. We first collect 1846 use cases 15...

10.48550/arxiv.2404.13940 preprint EN arXiv (Cornell University) 2024-04-22

DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

OPENALEX - Publications

Yulong Mao Kaiyu Huang Changhao Guan Ganglin Bao Fengran Mo and 1 more

10.18653/v1/2024.acl-long.626 article EN 2024-01-01

Dynamic Differential-Privacy Preserving SGD

OPENALEX - Publications

Jian Du Song Li Fengran Mo Siheng Chen

The vanilla Differentially-Private Stochastic Gradient Descent (DP-SGD), including DP-Adam and other variants, ensures the privacy of training data by uniformly distributing costs across steps. equivalent controlled maintaining same gradient clipping thresholds noise powers in each step result unstable updates a lower model accuracy when compared to non-DP counterpart. In this paper, we propose dynamic DP-SGD (along with DP-Adam, others) reduce performance loss gap while dynamically...

10.48550/arxiv.2111.00173 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Domain-Aware Word Segmentation for Chinese Language: A Document-Level Context-Aware Model

OPENALEX - Publications

Kaiyu Huang Keli Xiao Fengran Mo Bo Jin Zhuang Liu and 1 more

Word segmentation is an essential and challenging task in natural language processing, especially for the Chinese due to its high linguistic complexity. Existing methods word segmentation, including statistical machine learning neural network methods, usually have good performance specific knowledge domains. Given increasing importance of interdisciplinary cross-domain studies, one challenges handle out-of-vocabulary (OOV) words. show unsatisfactory meet practical standard. To this end, we...

10.1145/3481298 article EN ACM Transactions on Asian and Low-Resource Language Information Processing 2021-11-03

History-Aware Conversational Dense Retrieval

OPENALEX - Publications

Fengran Mo Chen Qu Kelong Mao Tianyu Zhu Zhan Su and 2 more

Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such requires a comprehensive understanding of conversational inputs to formulate good query based on historical information. In particular, should include relevant from previous conversation turns. However, current approaches for dense primarily rely fine-tuning pre-trained ad-hoc retriever using whole session, which can be lengthy noisy. Moreover,...

10.48550/arxiv.2401.16659 preprint EN arXiv (Cornell University) 2024-01-29

ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

OPENALEX - Publications

Kelong Mao Chenlong Deng Haonan Chen Fengran Mo Zheng Liu and 2 more

Conversational search requires accurate interpretation of user intent from complex multi-turn contexts. This paper presents ChatRetriever, which inherits the strong generalization capability large language models to robustly represent conversational sessions for dense retrieval. To achieve this, we propose a simple and effective dual-learning approach that adapts LLM retrieval via contrastive learning while enhancing session understanding through masked instruction tuning on high-quality...

10.48550/arxiv.2404.13556 preprint EN arXiv (Cornell University) 2024-04-21

ConvSDG: Session Data Generation for Conversational Search

OPENALEX - Publications

Fengran Mo Baili Yi Kelong Mao Chen Qu Kaiyu Huang and 1 more

Conversational search provides a more convenient interface for users to by allowing multi-turn interaction with the engine. However, effectiveness of conversational dense retrieval methods is limited scarcity training data required their fine-tuning. Thus, generating sessions relevant labels could potentially improve performance. Based on promising capabilities large language models (LLMs) text generation, we propose ConvSDG, simple yet effective framework explore feasibility boosting using...

10.1145/3589335.3651940 article EN 2024-05-12

Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation

OPENALEX - Publications

Qiwei Shao Fengran Mo Jian‐Yun Nie

Document-level biomedical concept extraction is the task of identifying concepts mentioned in a given document. Recent advancements have adapted pre-trained language models for this task. However, scarcity domain-specific data and deviation from their canonical names often hinder these models' effectiveness. To tackle issue, we employ MetaMapLite, an existing rule-based mapping system, to generate additional pseudo-annotated PubMed PMC. The annotated are used augment limited training data....

10.48550/arxiv.2407.02719 preprint EN arXiv (Cornell University) 2024-07-02

Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search

OPENALEX - Publications

Fengran Mo Chen Qu Kelong Mao Yihong Wu Zhan Su and 2 more

Conversational search supports multi-turn user-system interactions to solve complex information needs. Different from the traditional single-turn ad-hoc search, conversational encounters a more challenging problem of context-dependent query understanding with lengthy and long-tail history context. While rewriting (CQR) methods leverage explicit rewritten queries train model transform into stand-stone query, this is usually done without considering quality results. dense retrieval (CDR) use...

10.1145/3627673.3679534 article EN 2024-10-20

RAG-Studio: Towards In-Domain Adaptation of Retrieval Augmented Generation Through Self-Alignment

OPENALEX - Publications

Kelong Mao Zheng Liu Hongjin Qian Fengran Mo Chenlong Deng and 1 more

10.18653/v1/2024.findings-emnlp.41 article EN 2024-01-01

Coming Soon ...