NFDI4DS | UHH-SEMS - Publication Details

Yanbin Hao

ORCID: 0000-0002-0695-1566

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5043525989

Research Areas

Human Pose and Action Recognition
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Video Surveillance and Tracking Methods
Video Analysis and Summarization
Image Retrieval and Classification Techniques
Generative Adversarial Networks and Image Synthesis
Anomaly Detection Techniques and Applications
Topic Modeling
Gait Recognition and Analysis
Advanced Vision and Imaging
Advanced Neural Network Applications
Computer Graphics and Visualization Techniques
Hand Gesture Recognition Systems
Natural Language Processing Techniques
Robotics and Sensor-Based Localization
Advanced Text Analysis Techniques
Neural Networks and Applications
Face recognition and analysis
Human Motion and Animation
Recommender Systems and Techniques
Expert finding and Q&A systems
Cancer-related molecular mechanisms research
Digital Imaging for Blood Diseases

Hefei University of Technology
2014-2025

University of Science and Technology of China
2021-2025

City University of Hong Kong
2019-2021

Central China Normal University
2016-2018

Shanghai Maritime University
2009-2010

Northwestern Polytechnical University
2006

TV Program Recommendation for Multiple Viewers Based on user Profile Merging

OPENALEX - Publications

Zhiwen Yu Xingshe Zhou Yanbin Hao Jianhua Gu

10.1007/s11257-006-9005-6 article EN User Modeling and User-Adapted Interaction 2006-03-01

3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention

OPENALEX - Publications

Zhenhua Tang Zhaofan Qiu Yanbin Hao Richang Hong Ting Yao

Recent transformer-based solutions have shown great success in 3D human pose estimation. Nevertheless, to calculate the joint-to-joint affinity matrix, computational cost has a quadratic growth with increasing number of joints. Such drawback becomes even worse especially for estimation video sequence, which necessitates spatio-temporal correlation spanning over entire video. In this paper, we facilitate issue by decomposing learning into space and time, present novel Spatio-Temporal...

10.1109/cvpr52729.2023.00464 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

R²GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network

OPENALEX - Publications

Bin Zhu Chong‐Wah Ngo Jingjing Chen Yanbin Hao

Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from visualization. This paper studies new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), explore the feasibility generating problem. The motivation using GAN twofold: learning compatible cross-modal features in an adversarial way, and explanation search results by showing images generated recipes. novelty R2GAN comes...

10.1109/cvpr.2019.01174 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval

OPENALEX - Publications

Yanbin Hao Tingting Mu Richang Hong Meng Wang Ning An and 1 more

Near-duplicate video retrieval (NDVR) has been a significant research task in multimedia given its high impact applications, such as search, recommendation, and copyright protection. In addition to accurate performance, the exponential growth of online videos imposed heavy demands on efficiency scalability existing systems. Aiming at improving both accuracy speed, we propose novel stochastic multiview hashing algorithm facilitate construction large-scale NDVR system. Reliable mapping...

10.1109/tmm.2016.2610324 article EN IEEE Transactions on Multimedia 2016-09-15

Token Shift Transformer for Video Classification

OPENALEX - Publications

Hao Zhang Yanbin Hao Chong‐Wah Ngo

Transformer achieves remarkable successes in understanding 1 and 2-dimensional signals (e.g., NLP Image Content Understanding). As a potential alternative to convolutional neural networks, it shares merits of strong interpretability, high discriminative power on hyper-scale data, flexibility processing varying length inputs. However, its encoders naturally contain computational intensive operations such as pair-wise self-attention, incurring heavy burden when being applied the complex...

10.1145/3474085.3475272 preprint EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

OPENALEX - Publications

Yanbin Hao Shuo Wang P.P. Cao Xinjian Gao Tong Xu and 2 more

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to utilization perspective contexts. However, current research on attention generally focuses adopting a specific aspect contexts (e.g., channel, spatial/temporal, or global context) refine features and neglects their underlying correlation when computing attentions. This leads incomplete context hence bears weakness limited improvement. To tackle problem, this paper proposes an...

10.1109/tcsvt.2022.3169842 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-04-22

Group Contextualization for Video Recognition

OPENALEX - Publications

Yanbin Hao Hao Zhang Chong‐Wah Ngo Xiangnan He

Learning discriminative representation from the complex spatio-temporal dynamic space is essential for video recognition. On top of those stylized computational units, further refining learnt feature with axial contexts demonstrated to be promising in achieving this goal. However, previous works generally focus on utilizing a single kind calibrate entire channels and could hardly apply deal diverse activities. The problem can tackled by using pair-wise attentions recompute response...

10.1109/cvpr52688.2022.00100 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Aggregated Multi-GANs for Controlled 3D Human Motion Prediction

OPENALEX - Publications

Zhenguang Liu Kedi Lyu Shuang Wu Haipeng Chen Yanbin Hao and 1 more

Human motion prediction from historical pose sequence is at the core of many applications in machine intelligence. However, current state-of-the-art methods, predicted future confined within same activity. One can neither generate predictions that differ activity, nor manipulate body parts to explore various possibilities. Undoubtedly, this greatly limits usefulness and applicability prediction. In paper, we propose a generalization human task which control parameters be readily incorporated...

10.1609/aaai.v35i3.16321 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Motion Prediction using Trajectory Cues

OPENALEX - Publications

Zhenguang Liu Pengxiang Su Shuang Wu Xuanjing Shen Haipeng Chen and 2 more

Predicting human motion from a historical pose sequence is at the core of many applications in computer vision. Current state-of-the-art methods concentrate on learning contexts space, however, high dimensionality and complex nature invoke inherent difficulties extracting such contexts. In this paper, we instead advocate to model joint trajectory as smooth, vectorial, gives sufficient information model. Moreover, most existing consider only dependencies between skeletal connected joints,...

10.1109/iccv48922.2021.01305 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning

OPENALEX - Publications

Zhicai Wang Yanbin Hao Tingting Mu Ouxiang Li Shuo Wang and 1 more

Zero-shot learning (ZSL) suffers intensely from the domain shift issue, i.e., mismatch (or misalignment) between true and learned data distributions for classes without training (unseen classes). By additionally unlabelled collected unseen classes, transductive ZSL (TZSL) could reduce but only to a certain extent. To improve TZSL, we propose novel approach Bi-VAEGAN which strengthens distribution alignment visual space an auxiliary space. As result, it can largely shift. The proposed key...

10.1109/cvpr52729.2023.01905 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension

OPENALEX - Publications

Yanbin Hao Tingting Mu John Y. Goulermas Jianguo Jiang Richang Hong and 1 more

In this paper, a novel unsupervised hashing algorithm, referred to as t-USMVH, and its extension deep hashing, t-UDH, are proposed support large-scale video-to-video retrieval. To improve robustness of the learning, t-USMVH combines multiple types feature representations effectively fuses them by examining continuous relevance score based on Gaussian estimation over pairwise distances, also discrete neighbor cardinality reciprocal neighbors. reduce sensitivity scale changes for mapping...

10.1109/tip.2017.2737329 article EN IEEE Transactions on Image Processing 2017-08-07

Cross-Domain Sentiment Encoding through Stochastic Word Embedding

OPENALEX - Publications

Yanbin Hao Tingting Mu Richang Hong Meng Wang Xueliang Liu and 1 more

Sentiment analysis is an important topic concerning identification of feelings, attitudes, emotions and opinions from text. To automate such analysis, a large amount example text needs to be manually annotated for model training. This laborious expensive, but the cross-domain technique key solution reducing cost by reusing reviews across domains. However, its success largely relies on learning robust common representation space In recent years, significant effort has been invested improve...

10.1109/tkde.2019.2913379 article EN IEEE Transactions on Knowledge and Data Engineering 2019-04-27

Boosting Few-Shot Learning via Attentive Feature Regularization

OPENALEX - Publications

Xingyu Zhu Shuo Wang Jinda Lu Yanbin Hao Haifeng Liu and 1 more

Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two from different categories a blending factor. However, this operation weakens feature representation due linear interpolation and overlooking importance specific channels. To solve these issues, paper proposes attentive (AFR) which representativeness discriminability. In our approach, we first calculate relations between semantic labels...

10.1609/aaai.v38i7.28614 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

OPENALEX - Publications

Pengyuan Zhou Lin Wang Zhi Liu Yanbin Hao Pan Hui and 2 more

This paper offers an insightful examination of how currently top-trending AI technologies, i.e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field video technology, including generation, understanding, streaming.It highlights innovative use these technologies in producing highly realistic videos, a significant leap bridging gap between real-world dynamics digital creation.The study also delves into advanced capabilities LLMs...

10.36227/techrxiv.171172801.19993069/v1 preprint EN cc-by-sa 2024-03-29

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

OPENALEX - Publications

Zhicai Wang Longhui Wei Tan Wang Heyu Chen Yanbin Hao and 3 more

10.1109/cvpr52733.2024.01630 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Using Touchscreen Tablets to Help Young Children Learn to Tell Time

OPENALEX - Publications

Fuxing Wang Heping Xie Yuxin Wang Yanbin Hao Jing An

Young children are devoting increasing time to playing on handheld touchscreen devices (e.g., iPads). Though thousands of apps claimed be "educational," there is a lack sufficient evidence examining the impact touchscreens children's learning outcomes. In present study, two questions we focused were (a) whether using was helpful in teaching tell time, and (b) what extent young could transfer they had learned other media. A pre- posttest design adopted. After read iPad for 10 minutes, three...

10.3389/fpsyg.2016.01800 article EN cc-by Frontiers in Psychology 2016-11-17

MLP-JCG: Multi-Layer Perceptron With Joint-Coordinate Gating for Efficient 3D Human Pose Estimation

OPENALEX - Publications

Zhenhua Tang Jia Li Yanbin Hao Richang Hong

Various structural relations/dependencies exist among human body joints, which makes it possible to estimate 3D poses from 2D sources. The current research on pose estimation (3D-HPE for short) mainly focuses information a specific perspective. However, this cannot facilitate 2D-to-3D lifting. This paper presents novel and efficient multi-layer perceptron with joint-coordinate gating (MLP-JCG) model, exploring utilizing both the local global perform estimations. Specifically, MLP-JCG...

10.1109/tmm.2023.3240455 article EN IEEE Transactions on Multimedia 2023-01-01

FTCM: Frequency-Temporal Collaborative Module for Efficient 3D Human Pose Estimation in Video

OPENALEX - Publications

Zhenhua Tang Yanbin Hao Jia Li Richang Hong

Capturing cross-pose correlation from a sequence of frame-level 2D poses is essential for 3D human pose estimation (3D-HPE) in the video. Recent studies have shown promising potential modeling relation with feature-mixing operations on temporal domain. However, they seldom consider interaction across frequency This paper Frequency-Temporal Collaborative Module (FTCM) to explore feasibility encoding correlations both and domains. FTCM aims jointly capture global local more lightweight network...

10.1109/tcsvt.2023.3286402 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-06-23

CgT-GAN: CLIP-guided Text GAN for Image Captioning

OPENALEX - Publications

Jiarui Yu Haoran Li Yanbin Hao Bin Zhu Tong Xu and 1 more

The large-scale visual-language pre-trained model, Contrastive Language-Image Pre-training (CLIP), has significantly improved image captioning for scenarios without human-annotated image-caption pairs. Recent advanced CLIP-based human annotations follows a text-only training paradigm, i.e., reconstructing text from shared embedding space. Nevertheless, these approaches are limited by the training/inference gap or huge storage requirements embeddings. Given that it is trivial to obtain images...

10.1145/3581783.3611891 preprint EN 2023-10-26

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

OPENALEX - Publications

Fangwen Wu Jingxuan He Yufei Yin Yanbin Hao Gang Huang and 1 more

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to highlight semantic regions in weakly supervised segmentation. MCC adroitly draws inspiration from masked image modeling and contrastive learning devise a novel framework that induces keys contract toward regions. Unlike prevalent techniques directly eradicate patch the input when generating masks, we scrutinize neighborhood relations of tokens by exploring masks considering on affinity matrix. Moreover,...

10.1109/wacv57701.2024.00091 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Feature Mixture on Pre-trained Model for Few-shot Learning

OPENALEX - Publications

Shuo Wang Jinda Lu Haiyang Xu Yanbin Hao Xiangnan He

Few-shot learning (FSL) aims at recognizing a novel object under limited training samples. A robust feature extractor (backbone) can significantly improve the recognition performance of FSL model. However, an effective backbone is challenging issue since 1) designing and validating structures backbones are time-consuming expensive processes, 2) trained on known (base) categories more inclined to focus textures objects it learns, which hard describe To solve these problems, we propose mixture...

10.1109/tip.2024.3411452 article EN IEEE Transactions on Image Processing 2024-01-01

Mixed Attention and Channel Shift Transformer for Efficient Action Recognition

OPENALEX - Publications

Xiusheng Lu Yanbin Hao Lechao Cheng Sicheng Zhao Yutao Liu and 1 more

The practical use of the Transformer-based methods for processing videos is constrained by high computing complexity. Although previous approaches adopt spatiotemporal decomposition 3D attention to mitigate issue, they suffer from drawback neglecting majority visual tokens. This paper presents a novel mixed operation that subtly fuses random, spatial, and temporal mechanisms. proposed random stochastically samples video tokens in simple yet effective way, complementing other methods....

10.1145/3712594 article EN ACM Transactions on Multimedia Computing Communications and Applications 2025-01-17

CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion

OPENALEX - Publications

Yuan Wang Bin Xhu Yanbin Hao Chong‐Wah Ngo Yi Tan and 1 more

Recent advancements in text-to-image generation models have excelled creating diverse and realistic images. This success extends to food imagery, where various conditional inputs like cooking styles, ingredients, recipes are utilized. However, a yet-unexplored challenge is generating sequence of procedural images based on steps from recipe. could enhance the experience with visual guidance possibly lead an intelligent simulation system. To fill this gap, we introduce novel task called...

10.48550/arxiv.2501.09042 preprint EN arXiv (Cornell University) 2025-01-15

Improving Open-vocabulary Video Visual Relation Detection with Decomposed Prompt Learning and Relation Adjustment

OPENALEX - Publications

Ming Pei Yi Tan Yanbin Hao Hao Zhang Jinmeng Wu and 2 more

10.1109/icassp49660.2025.10890443 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Coming Soon ...