NFDI4DS | UHH-SEMS - Publication Details

Ji-Rong Wen

ORCID: 0000-0002-9777-9676

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5025631695

Research Areas

Topic Modeling
Recommender Systems and Techniques
Natural Language Processing Techniques
Web Data Mining and Analysis
Advanced Graph Neural Networks
Information Retrieval and Search Behavior
Multimodal Machine Learning Applications
Data Management and Algorithms
Advanced Image and Video Retrieval Techniques
Domain Adaptation and Few-Shot Learning
Text and Document Classification Technologies
Image Retrieval and Classification Techniques
Advanced Database Systems and Queries
Advanced Bandit Algorithms Research
Speech and dialogue systems
Semantic Web and Ontologies
Complex Network Analysis Techniques
Algorithms and Data Compression
Advanced Text Analysis Techniques
Expert finding and Q&A systems
Caching and Content Delivery
Data Quality and Management
Machine Learning in Healthcare
Video Analysis and Summarization
Speech Recognition and Synthesis

Renmin University of China
2016-2025

Beijing Institute of Big Data Research
2015-2023

IT University of Copenhagen
2023

Tokyo Institute of Technology
2023

Administration for Community Living
2023

American Jewish Committee
2023

Baidu (China)
2023

Data Management (Italy)
2022

Beijing Academy of Artificial Intelligence
2021-2022

Beijing University of Posts and Telecommunications
2022

A Survey of Large Language Models

OPENALEX - Publications

Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang and 17 more

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses significant challenge to develop capable AI algorithms for comprehending and grasping language. As major approach, language modeling has been widely studied understanding generation in the past two decades, evolving from statistical models neural models. Recently, pre-trained (PLMs) have proposed pre-training Transformer over large-scale corpora, showing strong capabilities...

10.48550/arxiv.2303.18223 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Pre-trained models: Past, present and future

OPENALEX - Publications

Xu Han Zhengyan Zhang Ning Ding Yuxian Gu Xiao Liu and 19 more

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled unlabeled data. By storing into parameters fine-tuning on specific tasks, rich implicitly encoded benefit variety downstream which has been extensively demonstrated via experimental...

10.1016/j.aiopen.2021.08.002 article EN cc-by-nc-nd AI Open 2021-01-01

S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization

OPENALEX - Publications

Kun Zhou Hui Wang Wayne Xin Zhao Yutao Zhu Sirui Wang and 3 more

Recently, significant progress has been made in sequential recommendation with deep learning. Existing neural models usually rely on the item prediction loss to learn model parameters or data representations. However, trained this is prone suffer from sparsity problem. Since it overemphasizes final performance, association fusion between context and sequence not well captured utilized for recommendation. To tackle problem, we propose S^3-Rec, which stands Self-Supervised learning Sequential...

10.1145/3340531.3411954 preprint EN 2020-10-19

A large-scale evaluation and analysis of personalized search strategies

OPENALEX - Publications

Zhicheng Dou Ruihua Song Ji-Rong Wen

Although personalized search has been proposed for many years and personalization strategies have investigated, it is still unclear whether consistently effective on different queries users, under contexts. In this paper, we study problem get some preliminary conclusions. We present a large-scale evaluation framework based query logs, then evaluate five (including two click-based three profile-based ones) using 12-day MSN logs. By analyzing the results, reveal that significant improvement...

10.1145/1242572.1242651 article EN 2007-05-08

Probabilistic query expansion using query logs

OPENALEX - Publications

Hang Cui Ji-Rong Wen Jian‐Yun Nie Wei‐Ying Ma

Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of methods have proposed in traditional information retrieval. However, these previous do not take into account specific characteristics web searching; particular, availability large amount user interaction recorded logs. In this study, we propose a new method for based on The central idea is extract probabilistic correlations between terms document by analyzing...

10.1145/511446.511489 article EN 2002-05-07

Clustering user queries of a search engine

OPENALEX - Publications

Ji-Rong Wen Jian‐Yun Nie Hao Zhang

Article Share on Clustering user queries of a search engine Authors: Ji-Rong Wen Microsoft Research, China, 5F, Beijing Sigma Center, No.49, Zhichun Road Haidian District, Beijing, P.R.China P.R.ChinaView Profile , Jian-Yun Nie Dept. Informatique et Recherche, opérationnelle, University Montreal, CP 6128, succursale Centre-ville, H3C 3J7 Canada CanadaView Hong-Jiang Zhang Authors Info & Claims WWW '01: Proceedings the 10th international conference World Wide WebMay 2001 Pages...

10.1145/371920.371974 article EN 2001-04-01

Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks

OPENALEX - Publications

Jin Huang Wayne Xin Zhao Hongjian Dou Ji-Rong Wen Edward Yi Chang

With the revival of neural networks, many studies try to adapt powerful sequential models, ıe Recurrent Neural Networks (RNN), recommendation. RNN-based networks encode historical interaction records into a hidden state vector. Although vector is able dependency, it still has limited representation power in capturing complicated user preference. It difficult capture fine-grained preference from sequence. Furthermore, latent usually hard understand and explain. To address these issues, this...

10.1145/3209978.3210017 article EN 2018-06-27

Query clustering using user logs

OPENALEX - Publications

Ji-Rong Wen Jian‐Yun Nie Hao Zhang

Query clustering is a process used to discover frequently asked questions or most popular topics on search engine. This crucial for engines based question-answering. Because of the short lengths queries, approaches keywords are not suitable query clustering. paper describes new method that makes use user logs which allow us identify documents users have selected query. The similarity between two queries may be deduced from common them. Our experiments show combination both and better than...

10.1145/503104.503108 article EN ACM transactions on office information systems 2002-01-01

Counterfactual VQA: A Cause-Effect Look at Language Bias

OPENALEX - Publications

Yulei Niu Kaihua Tang Hanwang Zhang Zhiwu Lu Xian‐Sheng Hua and 1 more

VQA models may tend to rely on language bias as a shortcut and thus fail sufficiently learn the multi-modal knowledge from both vision language. Recent debiasing methods proposed exclude prior during inference. However, they disentangle "good" context "bad" whole. In this paper, we investigate how mitigate in VQA. Motivated by causal effects, novel counterfactual inference framework, which enables us capture direct effect of questions answers reduce subtracting total effect. Experiments...

10.1109/cvpr46437.2021.01251 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms

OPENALEX - Publications

Wayne Xin Zhao Shanlei Mu Yupeng Hou Zihan Lin Yushuo Chen and 14 more

In recent years, there are a large number of recommendation algorithms proposed in the literature, from traditional collaborative filtering to deep learning algorithms. However, concerns about how standardize open source implementation continually increase research community. light this challenge, we propose unified, comprehensive and efficient recommender system library called RecBole (pronounced as [rEk'[email protected]]), which provides unified framework develop reproduce for purpose....

10.1145/3459637.3482016 article EN 2021-10-26

Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

OPENALEX - Publications

Kun Zhou Wayne Xin Zhao Shuqing Bian Yuanhang Zhou Ji-Rong Wen and 1 more

Conversational recommender systems (CRS) aim to recommend high-quality items users through interactive conversations. Although several efforts have been made for CRS, two major issues still remain be solved. First, the conversation data itself lacks of sufficient contextual information accurately understanding users' preference. Second, there is a semantic gap between natural language expression and item-level user

10.1145/3394486.3403143 article EN 2020-08-20

Connecting Social Media to E-Commerce: Cold-Start Product Recommendation Using Microblogging Information

OPENALEX - Publications

Wayne Xin Zhao Sui Li Yulan He Edward Yi Chang Ji-Rong Wen and 1 more

In recent years, the boundaries between e-commerce and social networking have become increasingly blurred. Many Web sites support mechanism of login where users can sign on using their network identities such as Facebook or Twitter accounts. Users also post newly purchased products microblogs with links to product pages. this paper, we propose a novel solution for cross-site cold-start recommendation, which aims recommend from at in "cold-start" situations, problem has rarely been explored...

10.1109/tkde.2015.2508816 article EN IEEE Transactions on Knowledge and Data Engineering 2015-12-17

A survey on large language model based autonomous agents

OPENALEX - Publications

Lei Wang Chen Ma Xueyang Feng Zeyu Zhang Hao Yang and 8 more

Abstract Autonomous agents have long been a research focus in academic and industry communities. Previous often focuses on training with limited knowledge within isolated environments, which diverges significantly from human learning processes, makes the hard to achieve human-like decisions. Recently, through acquisition of vast amounts Web knowledge, large language models (LLMs) shown potential human-level intelligence, leading surge LLM-based autonomous agents. In this paper, we present...

10.1007/s11704-024-40231-1 article EN cc-by Frontiers of Computer Science 2024-03-22

Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals

OPENALEX - Publications

Gaole He Yunshi Lan Jing Jiang Wayne Xin Zhao Ji-Rong Wen

Multi-hop Knowledge Base Question Answering (KBQA) aims to find the answer entities that are multiple hops away in Knowl- edge (KB) from question. A major challenge is lack of supervision signals at intermediate steps. Therefore, multi-hop KBQA algorithms can only receive feedback final answer, which makes learning unstable or ineffective. To address this challenge, we propose a novel teacher-student approach for task. In our approach, stu- dent network correct query, while teacher tries...

10.1145/3437963.3441753 preprint EN 2021-03-06

Towards artificial general intelligence via a multimodal foundation model

OPENALEX - Publications

Nanyi Fei Zhiwu Lu Yizhao Gao Guoxing Yang Yuqi Huo and 7 more

Abstract The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities human. Despite tremendous success in AI research, most existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards general (AGI), we develop foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream tasks. achieve goal, propose pre-train our by self-supervised learning weak semantic...

10.1038/s41467-022-30761-2 article EN cc-by Nature Communications 2022-06-02

Evaluating Object Hallucination in Large Vision-Language Models

OPENALEX - Publications

Yifan Li Yifan Du Kun Zhou Jinpeng Wang Zhao Xin and 1 more

Inspired by the superior language abilities of large models (LLM), vision-language (LVLM) have been recently proposed integrating powerful LLMs for improving performance on complex multimodal tasks. Despite promising progress LVLMs, we find that they suffer from object hallucinations, i.e., tend to generate objects inconsistent with target images in descriptions. To investigate it, this work presents first systematic study hallucination LVLMs. We conduct evaluation experiments several...

10.18653/v1/2023.emnlp-main.20 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

OPENALEX - Publications

Ruiyang Ren Yingqi Qu Jing Liu Wayne Xin Zhao Qiaoqiao She and 3 more

In various natural language processing tasks, passage retrieval and re-ranking are two key procedures in finding ranking relevant information. Since both the contribute to final performance, it is important jointly optimize them order achieve mutual improvement. this paper, we propose a novel joint training approach for dense reranking. A major contribution that introduce dynamic listwise distillation, where design unified retriever re-ranker. During re-ranker can be adaptively improved...

10.18653/v1/2021.emnlp-main.224 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions

OPENALEX - Publications

Yunshi Lan Gaole He Jinhao Jiang Jing Jiang Wayne Xin Zhao and 1 more

Knowledge base question answering (KBQA) aims to answer a over knowledge (KB). Recently, large number of studies focus on semantically or syntactically complicated questions. In this paper, we elaborately summarize the typical challenges and solutions for complex KBQA. We begin with introducing background about KBQA task. Next, present two mainstream categories methods KBQA, namely semantic parsing-based (SP-based) information retrieval-based (IR-based) methods. then review advanced...

10.24963/ijcai.2021/611 article EN 2021-08-01

Towards Universal Sequence Representation Learning for Recommender Systems

OPENALEX - Publications

Yupeng Hou Shanlei Mu Wayne Xin Zhao Yaliang Li Bolin Ding and 1 more

In order to develop effective sequential recommenders, a series of sequence representation learning (SRL) methods are proposed model historical user behaviors. Most existing SRL rely on explicit item IDs for developing the models better capture preference. Though some extent, these difficult be transferred new recommendation scenarios, due limitation by explicitly modeling IDs. To tackle this issue, we present novel universal approach, named UniSRec. The approach utilizes associated...

10.1145/3534678.3539381 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

Counterfactual Data-Augmented Sequential Recommendation

OPENALEX - Publications

Zhenlei Wang Jingsen Zhang Hongteng Xu Xu Chen Yongfeng Zhang and 2 more

Sequential recommendation aims at predicting users' preferences based on their historical behaviors. However, this strategy may not perform well in practice due to the sparsity of real-world data. In paper, we propose a novel counterfactual data augmentation framework mitigate impact imperfect training and empower sequential models. Our is composed sampler model an anchor model. The generate new user behavior sequences observed ones, while leveraged provide final list, which trained both...

10.1145/3404835.3462855 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021-07-11

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

OPENALEX - Publications

Junyi Li Xiaoxue Cheng Xin Zhao Jian‐Yun Nie Ji-Rong Wen

Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by factual knowledge. To understand what types of and which extent LLMs apt hallucinate, we introduce Hallucination Evaluation for Language Models (HaluEval) benchmark, a large collection generated human-annotated hallucinated samples evaluating performance in recognizing hallucination. these samples, propose ChatGPT-based two-step framework,...

10.18653/v1/2023.emnlp-main.397 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Towards Unified Conversational Recommender Systems via Knowledge-Enhanced Prompt Learning

OPENALEX - Publications

Xiaolei Wang Kun Zhou Ji-Rong Wen Wayne Xin Zhao

Conversational recommender systems (CRS) aim to proactively elicit user preference and recommend high-quality items through natural language conversations. Typically, a CRS consists of recommendation module predict preferred for users conversation generate appropriate responses. To develop an effective CRS, it is essential seamlessly integrate the two modules. Existing works either design semantic alignment strategies, or share knowledge resources representations between However, these...

10.1145/3534678.3539382 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

Debiased Contrastive Learning of Unsupervised Sentence Representations

OPENALEX - Publications

Kun Zhou Beichen Zhang Xin Zhao Ji-Rong Wen

Recently, contrastive learning has been shown to be effective in improving pre-trained language models (PLM) derive high-quality sentence representations. It aims pull close positive examples enhance the alignment while push apart irrelevant negatives for uniformity of whole representation space.However, previous works mostly adopt in-batch or sample from training data at random. Such a way may cause sampling bias that improper (false and anisotropy representations) are used learn...

10.18653/v1/2022.acl-long.423 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

A Survey on Large Language Model based Autonomous Agents

OPENALEX - Publications

Lei Wang Chen Ma Xueyang Feng Zeyu Zhang Hao Yang and 8 more

Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous this field often focuses on training with limited knowledge within isolated environments, which diverges significantly from human learning processes, thus makes the hard to achieve human-like decisions. Recently, through acquisition of vast amounts web knowledge, large language models (LLMs) demonstrated remarkable potential achieving human-level intelligence. This has sparked an...

10.48550/arxiv.2308.11432 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Coming Soon ...