Shengyu Zhang

ORCID: 0000-0002-0030-8289
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Recommender Systems and Techniques
  • Multimodal Machine Learning Applications
  • Advanced Bandit Algorithms Research
  • Topic Modeling
  • Advanced Graph Neural Networks
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • IoT and Edge/Fog Computing
  • Image Retrieval and Classification Techniques
  • Privacy-Preserving Technologies in Data
  • Video Analysis and Summarization
  • Domain Adaptation and Few-Shot Learning
  • Expert finding and Q&A systems
  • Music and Audio Processing
  • Generative Adversarial Networks and Image Synthesis
  • Image and Video Quality Assessment
  • Stochastic Gradient Optimization Techniques
  • Caching and Content Delivery
  • Music Technology and Sound Studies
  • Image Processing and 3D Reconstruction
  • Semantic Web and Ontologies
  • Natural Language Processing Techniques
  • Cryptography and Data Security
  • Speech and Audio Processing
  • Human Mobility and Location-Based Analysis

Zhejiang University
2020-2024

Communication University of China
2024

Alibaba Group (China)
2024

Chinese University of Hong Kong
2013-2018

Wuhan University
2018

California Institute of Technology
2007

Chengdu University of Information Technology
2005

Influenced by the great success of deep learning via cloud computing and rapid development edge chips, research in artificial intelligence (AI) has shifted to both paradigms, i.e., computing. In recent years, we have witnessed significant progress developing more advanced AI models on servers that surpass traditional owing model innovations (e.g., Transformers, Pretrained families), explosion training data soaring capabilities. However, computing, especially collaborative are still its...

10.1109/tkde.2022.3178211 article EN IEEE Transactions on Knowledge and Data Engineering 2022-01-01

Device Model Generalization (DMG) is a practical yet under-investigated research topic for on-device machine learning applications. It aims to improve the generalization ability of pre-trained models when deployed on resource-constrained devices, such as improving performance cloud smart mobiles. While quite lot works have investigated data distribution shift across clouds and most them focus model fine-tuning personalized individual devices facilitate DMG. Despite their promising, these...

10.1145/3543507.3583451 article EN Proceedings of the ACM Web Conference 2022 2023-04-26

Recent research on video moment retrieval has mostly focused enhancing the performance of accuracy, efficiency, and robustness, all which largely rely abundance high-quality annotations. While precise frame-level annotations are time-consuming cost-expensive, few attentions have been paid to labeling process. In this work, we explore a new interactive manner stimulate process human-in-the-loop annotation in task. The key challenge is select “ambiguous” frames videos for binary facilitate...

10.1109/cvpr52729.2023.02204 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

A common assumption behind most of the recent research on network rate allocation is that traffic flows are elastic, which means their utility functions concave and continuous there no hard limit allocated to each flow. These critical assumptions lead tractability analytic models for based maximization, but also applicability resulting protocols. This paper focuses inelastic removes these restrictive often invalid assumptions. First, we consider nonconcave functions, turn maximization into...

10.1109/tnet.2007.896507 article EN IEEE/ACM Transactions on Networking 2007-12-01

In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where pretraining data distribution differs from that downstream on which pretrained model will be fine-tuned. Existing methods for are purely likelihood-based, leading spurious correlations and hurt generalization ability when transferred tasks. By correlation, mean conditional probability one token (object or word) given another can high (due dataset biases) without robust (causal)...

10.1145/3394171.3413518 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Effectively representing users lie at the core of modern recommender systems. Since users' interests naturally exhibit multiple aspects, it is increasing interest to develop multi-interest frameworks for recommendation, rather than represent each user with an overall embedding. Despite their effectiveness, existing methods solely exploit encoder (the forward flow) aspects interests. However, without explicit regularization, embeddings may not be distinct from other nor semantically reflect...

10.1145/3485447.3512094 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

Large Language Models (LLMs) for Recommendation (LLM4Rec) is a promising research direction that has demonstrated exceptional performance in this field. However, its inability to capture real-time user preferences greatly limits the practical application of LLM4Rec because (i) LLMs are costly train and infer frequently, (ii) struggle access data (its large number parameters poses an obstacle deployment on devices). Fortunately, small recommendation models (SRMs) can effectively supplement...

10.1145/3690624.3709335 preprint EN arXiv (Cornell University) 2025-01-09

Learning user representations based on historical behaviors lies at the core of modern recommender systems. Recent advances in sequential recommenders have convincingly demonstrated high capability extracting effective from given behavior sequences. Despite significant progress, we argue that solely modeling observational sequences may end up with a brittle and unstable system due to noisy sparse nature interactions logged. In this paper, propose learn accurate robust representations, which...

10.1145/3404835.3462908 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021-07-11

<u>V</u>ideo <u>O</u>bject <u>G</u>rounding (VOG) is the problem of associating spatial object regions in video to a descriptive natural language query. This challenging vision-language task that necessitates constructing correct cross-modal correspondence and modeling appropriate spatio-temporal context query caption, thereby localizing specific objects accurately. In this paper, we tackle by novel framework called <u>H</u>i<u>E</u>rarchical spatio-tempo<u>R</u>al reas<u>O</u>ning (HERO)...

10.1145/3503161.3548333 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Large Language Models (LLMs) have demonstrated strong performance across various reasoning tasks, yet building a single model that consistently excels all domains remains challenging. This paper addresses this problem by exploring strategies to integrate multiple domain-specialized models into an efficient pivot model.We propose two fusion combine the strengths of LLMs: (1) pairwise, multi-step approach sequentially distills each source model, followed weight merging step distilled final...

10.48550/arxiv.2501.02795 preprint EN arXiv (Cornell University) 2025-01-06

Deep neural networks have become foundational to advancements in multiple domains, including recommendation systems, natural language processing, and so on. Despite their successes, these models often contain incompatible parameters that can be underutilized or detrimental model performance, particularly when faced with specific, varying data distributions. Existing research excels removing such merging the outputs of different pretrained models. However, former focuses on efficiency rather...

10.48550/arxiv.2501.07596 preprint EN arXiv (Cornell University) 2025-01-09

In recommender systems, modeling user-item behaviors is essential for user representation learning. Existing sequential recommenders consider the correlations between historically interacted items capturing users' historical preferences. However, since preferences are by nature time-evolving and diversified, solely preference (without being aware of trends preferences) can be inferior recommending complementary or fresh thus hurt effectiveness systems. this paper, we bridge gap past...

10.1145/3442381.3449791 article EN 2021-04-19

In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred characteristics depicted in is vital successful promoting. Traditional captioning methods, which focus on routinely describing what exists and happens video, not amenable product-oriented captioning. To address this problem, we propose captioner framework, abbreviated as Poet. Poet firstly represents spatial-temporal graphs. Then, based...

10.1145/3394171.3413880 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

In recommender systems, users' behavior data are driven by the interactions of user-item latent factors. To improve recommendation effectiveness and robustness, recent advances focus on factor disentanglement via variational inference. Despite significant progress, uncovering underlying interactions, i.e., dependencies factors, remains largely neglected literature. bridge gap, we investigate joint factors between them, namely structure learning. We propose to analyze problem from causal...

10.1109/tpami.2023.3247563 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-02-24

Text-based image captioning (TextCap) requires simultaneous comprehension of visual content and reading the text images to generate a natural language description. Although task can teach machines understand complex human environment further given that is omnipresent in our daily surroundings, it poses additional challenges normal captioning. A text-based intuitively contains abundant multimodal relational content, is, details be described diversely from multiview rather than single caption....

10.1609/aaai.v36i3.20243 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Modern online platforms are increasingly employing recommendation systems to address information overload and improve user engagement. There is an evolving paradigm in this research field that network learning occurs both on the cloud edges with knowledge transfer between (i.e., edge-cloud collaboration). Recent works push further by enabling edge-specific context-aware adaptivity, where model parameters updated real-time based incoming on-edge data. However, we argue frequent data exchanges...

10.48550/arxiv.2302.07335 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Existing video-audio understanding models are trained and evaluated in an intra-domain setting, facing performance degeneration real-world applications where multiple domains distribution shifts naturally exist. The key to domain generalization (VADG) lies alleviating spurious correlations over multi-modal features. To achieve this goal, we resort causal theory attribute such correlation confounders affecting both features labels. We propose a DeVADG framework that conducts uni-modal...

10.1609/aaai.v37i12.26787 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Tackling the pervasive issue of data sparsity in recommender systems, we present an insightful investigation into burgeoning area non-overlapping cross-domain recommendation, a technique that facilitates transfer interaction knowledge across domains without necessitating inter-domain user/item correspondence. Existing approaches have predominantly depended on auxiliary information, such as user reviews and item tags, to establish connectivity, but these resources may become inaccessible due...

10.1145/3643807 article EN ACM transactions on office information systems 2024-02-01

In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive volume. To recommend these videos to potential consumers more effectively, diverse and catchy video titles critical. However, seldom accompany appropriate titles. bridge this gap, we integrate comprehensive sources information, including content narrative comment sentences supplied by consumers, product attributes, an end-to-end...

10.1145/3394486.3403325 preprint EN 2020-08-20

Recommendation performance usually exhibits a long-tail distribution over users — small portion of head enjoy much more accurate recommendation services than the others. We reveal two sources this heterogeneity problem: uneven historical interactions (a natural source); and biased training recommender models model source). As addressing problem cannot sacrifice overall performance, wise choice is to eliminate bias while maintaining heterogeneity. The key debiased lies in eliminating effect...

10.1109/tkde.2023.3290545 article EN IEEE Transactions on Knowledge and Data Engineering 2023-06-29

Cloud storage has gained a remarkable success in recent years with an increasing number of consumers and enterprises outsourcing their data to the cloud. To assure availability integrity outsourced data, several protocols have been proposed audit cloud storage. Despite formally guaranteed security, constructions employed heavy cryptographic operations as well advanced concepts (e.g., bilinear maps over elliptic curves digital signatures), thus are inefficient admit wide applicability...

10.1109/infocom.2015.7218627 article EN 2015-04-01

Waterfall Recommender System (RS), a popular form of RS in mobile applications, is stream recommended items consisting successive pages that can be browsed by scrolling. In waterfall RS, when user finishes browsing page, the edge (e.g., phones) would send request to cloud server get new page recommendations, known as paging mechanism. RSs typically put large number into one reduce excessive resource consumption from numerous requests, which, however, diminish RSs' ability timely renew...

10.1145/3534678.3539123 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12
Coming Soon ...