NFDI4DS | UHH-SEMS - Publication Details

Self-supervised Learning for Large-scale Item Recommendations

OPENALEX - Publications

Tiansheng Yao Xinyang Yi Derek Zhiyuan Cheng Felix Yu Ting Chen and 6 more

Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search recommendation systems. To model the input space with large-vocab categorical features, typical learns joint embedding through neural networks for both queries user feedback data. However, millions to billions of corpus, users tend provide very small set them, causing power-law distribution. This makes data long-tail extremely sparse.

10.1145/3459637.3481952 article EN 2021-10-26

A Model of Two Tales: Dual Transfer Learning Framework for Improved Long-tail Item Recommendation

OPENALEX - Publications

Yin Zhang⋆ Derek Zhiyuan Cheng Tiansheng Yao Xinyang Yi Lichan Hong and 1 more

Highly skewed long-tail item distribution is very common in recommendation systems. It significantly hurts model performance on tail items. To improve tail-item recommendation, we conduct research to transfer knowledge from head items items, leveraging the rich user feedback and semantic connections between Specifically, propose a novel dual learning framework that jointly learns both model-level item-level: 1. The builds generic meta-mapping of parameters few-shot many-shot model. captures...

10.1145/3442381.3450086 article EN 2021-04-19

Efficient Subspace Segmentation via Quadratic Programming

OPENALEX - Publications

Shusen Wang Xiao‐Tong Yuan Tiansheng Yao Shuicheng Yan Jialie Shen

We explore in this paper efficient algorithmic solutions to robustsubspace segmentation. propose the SSQP, namely SubspaceSegmentation via Quadratic Programming, partition data drawnfrom multiple subspaces into clusters. The basic idea ofSSQP is express each datum as linear combination of otherdata regularized by an overall term targeting zero reconstructioncoefficients over vectors from different subspaces. derivedcoefficient matrix solving a quadratic programming problem istaken affinity...

10.1609/aaai.v25i1.7892 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2011-08-04

Distributionally-robust Recommendations for Improving Worst-case User Experience

OPENALEX - Publications

Hongyi Wen Xinyang Yi Tiansheng Yao Jiaxi Tang Lichan Hong and 1 more

Modern recommender systems have evolved rapidly along with deep learning models that are well-optimized for overall performance, especially those trained under Empirical Risk Minimization (ERM). However, a recommendation algorithm focuses solely on the average performance may reinforce exposure bias and exacerbate "rich-get-richer" effect, leading to unfair user experience. In simulation study, we demonstrate such gap among various groups is enlarged by an ERM-trained in long-term. To...

10.1145/3485447.3512255 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

Learning to Embed Categorical Features without Embedding Tables for Recommendation

OPENALEX - Publications

Wang-Cheng Kang Derek Zhiyuan Cheng Tiansheng Yao Xinyang Yi Ting Chen and 2 more

Embedding learning of categorical features (e.g. user/item IDs) is at the core various recommendation models. The standard approach creates an embedding table where each row represents a dedicated vector for every unique feature value. However, this method fails to efficiently handle high-cardinality and unseen values new video ID) that are prevalent in real-world systems. In paper, we propose alternative framework Deep Hash (DHE), replacing tables by deep network compute embeddings on fly....

10.1145/3447548.3467304 article EN 2021-08-12

Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)

OPENALEX - Publications

Zhang Yin Ruoxi Wang Derek Zhiyuan Cheng Tiansheng Yao Xinyang Yi and 3 more

Industry recommender systems usually suffer from highly-skewed long-tail item distributions where a small fraction of the items receives most user feedback. This skew hurts quality especially for slices without much While there have been many research advances made in academia, deploying these methods production is very difficult and few improvements industry. One challenge that often hurt overall performance; additionally, they could be complex expensive to train serve.

10.1145/3580305.3599814 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023-08-04

Self-supervised Learning for Large-scale Item Recommendations

OPENALEX - Publications

Tiansheng Yao Xinyang Yi Derek Zhiyuan Cheng Felix Yu Ting Chen and 6 more

Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search recommendation systems. To model the input space with large-vocab categorical features, typical learns joint embedding through neural networks for both queries user feedback data. However, millions to billions of corpus, users tend provide very small set them, causing power-law distribution. This makes data long-tail extremely sparse. Inspired by recent success...

10.48550/arxiv.2007.12865 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Improving Multi-Task Generalization via Regularizing Spurious Correlation

OPENALEX - Publications

Ziniu Hu Zhe Zhao Xinyang Yi Tiansheng Yao Lichan Hong and 2 more

Multi-Task Learning (MTL) is a powerful learning paradigm to improve generalization performance via knowledge sharing. However, existing studies find that MTL could sometimes hurt generalization, especially when two tasks are less correlated. One possible reason hurts spurious correlation, i.e., some and not causally related task labels, but the model mistakenly utilize them thus fail such correlation changes. In setup, there exist several unique challenges of correlation. First, risk having...

10.48550/arxiv.2205.09797 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Learning Bayesian network parameters under equivalence constraints

OPENALEX - Publications

Tiansheng Yao Arthur Choi Adnan Darwiche

10.1016/j.artint.2015.05.007 article EN publisher-specific-oa Artificial Intelligence 2015-06-05

Self-Auxiliary Distillation for Sample Efficient Learning in Google-Scale Recommenders

OPENALEX - Publications

Zhang Yin Ruoxi Wang Xiang Li Tiansheng Yao A. Evdokimov and 6 more

Industrial recommendation systems process billions of daily user feedback which are complex and noisy. Efficiently uncovering preference from these signals becomes crucial for high-quality recommendation. We argue that those not inherently equal in terms their informative value training ability, is particularly salient industrial applications with multi-stage processes (e.g., augmentation, retrieval, ranking). Considering that, this work, we propose a novel self-auxiliary distillation...

10.1145/3640457.3688041 article EN 2024-10-08

A Model of Two Tales: Dual Transfer Learning Framework for Improved Long-tail Item Recommendation

OPENALEX - Publications

Yin Zhang⋆ Derek Zhiyuan Cheng Tiansheng Yao Xinyang Yi Lichan Hong and 1 more

Highly skewed long-tail item distribution is very common in recommendation systems. It significantly hurts model performance on tail items. To improve tail-item recommendation, we conduct research to transfer knowledge from head items items, leveraging the rich user feedback and semantic connections between Specifically, propose a novel dual learning framework that jointly learns both model-level item-level: 1. The builds generic meta-mapping of parameters few-shot many-shot model. captures...

10.48550/arxiv.2010.15982 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Learning to Embed Categorical Features without Embedding Tables for Recommendation

OPENALEX - Publications

Wang-Cheng Kang Derek Zhiyuan Cheng Tiansheng Yao Xinyang Yi Ting Chen and 2 more

Embedding learning of categorical features (e.g. user/item IDs) is at the core various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated vector for every unique feature value. However, this method fails to efficiently handle high-cardinality unseen values new video ID) that are prevalent in real-world systems. In paper, we propose alternative framework Deep Hash...

10.48550/arxiv.2010.10784 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN)

OPENALEX - Publications

Yin Zhang⋆ Ruoxi Wang Derek Zhiyuan Cheng Tiansheng Yao Xinyang Yi and 3 more

Industry recommender systems usually suffer from highly-skewed long-tail item distributions where a small fraction of the items receives most user feedback. This skew hurts quality especially for slices without much While there have been many research advances made in academia, deploying these methods production is very difficult and few improvements industry. One challenge that often hurt overall performance; additionally, they could be complex expensive to train serve. In this work, we aim...

10.48550/arxiv.2210.14309 preprint EN other-oa arXiv (Cornell University) 2022-01-01