NFDI4DS | UHH-SEMS - Publication Details

Jun Yu

ORCID: 0000-0002-3197-8103

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5048818071

Research Areas

Face recognition and analysis
Face and Expression Recognition
Advanced Image and Video Retrieval Techniques
Advanced Neural Network Applications
Speech and Audio Processing
Advanced Image Processing Techniques
Emotion and Mood Recognition
Generative Adversarial Networks and Image Synthesis
Human Motion and Animation
Multimodal Machine Learning Applications
Human Pose and Action Recognition
Domain Adaptation and Few-Shot Learning
Video Surveillance and Tracking Methods
Advanced Vision and Imaging
Video Analysis and Summarization
Machine Learning and Data Classification
Image Retrieval and Classification Techniques
Image Processing Techniques and Applications
Image and Signal Denoising Methods
Image Enhancement Techniques
Anomaly Detection Techniques and Applications
Hand Gesture Recognition Systems
Geotechnical Engineering and Underground Structures
3D Shape Modeling and Analysis
Music and Audio Processing

University of Science and Technology of China
2016-2025

Jilin University of Chemical Technology
2025

Xi'an Technological University
2012-2024

Xinjiang University
2024

Guangxi University
2024

Central South University
2010-2023

Chongqing University of Science and Technology
2023

Liaoning Shihua University
2023

Tongji University
2009-2022

State Grid Corporation of China (China)
2021

Deep Modular Co-Attention Networks for Visual Question Answering

OPENALEX - Publications

Yu Zhou Jun Yu Yuhao Cui Dacheng Tao Qi Tian

Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content images textual questions. Therefore, designing an effective `co-attention' model to associate key words in questions with objects is central VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, deep models show little improvement over their counterparts. In this paper, we propose Modular Co-Attention Network (MCAN)...

10.1109/cvpr.2019.00644 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Combating Noisy Labels with Sample Selection by Mining High-Discrepancy Examples

OPENALEX - Publications

Xiaobo Xia Bo Han Yibing Zhan Jun Yu Mingming Gong and 2 more

The sample selection approach is popular in learning with noisy labels. state-of-the-art methods train two deep networks simultaneously for selection, which aims to employ their different abilities. To prevent from converging a consensus, divergence should be maintained. Prior work presents that the can kept by locating disagreement data on prediction labels of are different. However, this procedure sample-inefficient generalization, means only few clean examples utilized training. In paper,...

10.1109/iccv51070.2023.00176 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

OPENALEX - Publications

Yijun Song Jingwen Wang Lin Ma Yu Zhou Jun Yu

The task of temporally grounding textual queries in videos is to localize one video segment that semantically corresponds the given query. Most existing approaches rely on segment-sentence pairs (temporal annotations) for training, which are usually unavailable real-world scenarios. In this work we present an effective weakly-supervised model, named as Multi-Level Attentional Reconstruction Network (MARN), only relies video-sentence during training stage. proposed method leverages idea...

10.48550/arxiv.2003.07048 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Deep Modular Co-Attention Networks for Visual Question Answering

OPENALEX - Publications

Yu Zhou Jun Yu Yuhao Cui Dacheng Tao Qi Tian

10.48550/arxiv.1906.10770 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Multimodal Inputs Driven Talking Face Generation With Spatial–Temporal Dependency

OPENALEX - Publications

Lingyun Yu Jun Yu Mengyan Li Qiang Ling

Given an arbitrary speech clip or text information as input, the proposed work aims to generate a talking face video with accurate lip synchronization. Existing works mainly have three limitations. (1) A single-modal learning is adopted either audio hence it lacks complementarity of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multimodal inputs</i> . (2) Each frame generated independently, ignores...

10.1109/tcsvt.2020.2973374 article EN IEEE Transactions on Circuits and Systems for Video Technology 2020-02-12

Identity–Expression Dual Branch Network for Facial Expression Recognition

OPENALEX - Publications

Haifeng Zhang Wen Su Jun Yu Zengfu Wang

Accurate facial expression recognition is challenging because identity biases introduce large intraclass variations and high interclass similarities. Most existing approaches are devoted to alleviate the effects of identity. However, based on theories cognitive science, psychology, physiology, this article argues that information important can promote recognition. Motivated by our investigation influences recognition, proposes an identity–expression dual branch network (IE-DBN) for First,...

10.1109/tcds.2020.3034807 article EN IEEE Transactions on Cognitive and Developmental Systems 2020-10-29

Multi-Object Tracking: Decoupling Features to Solve the Contradictory Dilemma of Feature Requirements

OPENALEX - Publications

Yan Jin Fang Gao Jun Yu Jiabao Wang Feng Shuang

Multi-object tracking achieves the acquisition of target location information and identity through two subtasks, detection re-identification (ReID). The existing commonly used one-shot framework has speed advantages, but subtasks have different feature requirements, which leads to competitive learning in training thus weakens quality. We propose a decoupling based multi-object FDTrack for contradictory requirements. Through mutual inhibition features backbone network are decoupled. Then...

10.1109/tcsvt.2023.3249162 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-02-27

Data and knowledge-driven deep multiview fusion network based on diffusion model for hyperspectral image classification

OPENALEX - Publications

Junjie Zhang Feng Zhao Hanqiang Liu Jun Yu

10.1016/j.eswa.2024.123796 article EN Expert Systems with Applications 2024-03-27

Exploring Facial Expression Recognition through Semi-Supervised Pre-training and Temporal Modeling

OPENALEX - Publications

Jun Yu Zhihong Wei Zhongpeng Cai Gongpeng Zhao Zerui Zhang and 6 more

10.1109/cvprw63382.2024.00492 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17

Faster and Stronger: Unleashing Data Processing Potential through Hardware Heterogeneity

OPENALEX - Publications

Cong Wang Yang Luo Wenzhuo Du Ke Wang Naijie Gu and 1 more

With the rapid advancement of AI technology, there has been a substantial surge in need for computational resources. Particularly deep learning, machine and large-scale data analysis, processing extensive datasets necessitates exceptionally high levels efficacy speed. Conventional homogeneous computing platforms, predominantly reliant on Central Processing Units (CPU), have encountered challenges meeting escalating demands high-performance computing. Consequently, this study advocates...

10.1109/jiot.2025.3526662 article EN IEEE Internet of Things Journal 2025-01-01

Overview of deep learning YOLO algorithm

OPENALEX - Publications

Yaohui Pan Gang Wang Jun Yu

At present, the YOLO algorithm has become an indispensable core real-time object detection technology in aspects such as unmanned driving, face detection, and robot applications, its versions are constantly being updated upgraded. Herein, we deeply analyze evolution process of carefully investigate innovations contributions arising from iterations YOLOv1 to YOLOv5. We make vivid inspiring prospects for future development direction point out feasibility necessity research on algorithm.

10.1117/12.3055712 article EN 2025-01-09

Domain-Separated Bottleneck Attention Fusion Framework for Multimodal Emotion Recognition

OPENALEX - Publications

Peng He Jun Yu Chengjie Ge Wei Jia W. L. Xu and 3 more

As a focal point of research in various fields, human body language understanding has long been subject intense interest. Within this realm, the exploration emotion recognition through analysis facial expressions, voice patterns, and physiological signals, holds significant practical value. Compared with unimodal approaches, multimodal models leverage complementary information from vision, acoustic, modalities to robust perceive sentiment attitudes. However, heterogeneity among modality...

10.1145/3711865 article EN ACM Transactions on Multimedia Computing Communications and Applications 2025-01-10

MCLL-Diff: Multiconditional Low-Light Image Enhancement based on Diffusion Probabilistic Models

OPENALEX - Publications

Fengxin Chen Ye Yu Jun Yi Ting Zhang Zhao Ji and 2 more

10.1109/jsen.2025.3534566 article EN IEEE Sensors Journal 2025-01-01

Contrastive Learning with Multiple Prototypes for Unsupervised Domain Adaptive Semantic Segmentation

OPENALEX - Publications

Jun Yu Guochen Xie Quansheng Liu Zhen Kan Lei Wang and 4 more

10.1109/tmm.2025.3543115 article EN IEEE Transactions on Multimedia 2025-01-01

Towards Text-Image Interleaved Retrieval

OPENALEX - Publications

Xin Zhang Ziqi Dai Yongqi Li Yanzhao Zhang Dingkun Long and 5 more

Current multimodal information retrieval studies mainly focus on single-image inputs, which limits real-world applications involving multiple images and text-image interleaved content. In this work, we introduce the (TIIR) task, where query document are sequences, model is required to understand semantics from context for effective retrieval. We construct a TIIR benchmark based naturally wikiHow tutorials, specific pipeline designed generate queries. To explore adapt several off-the-shelf...

10.48550/arxiv.2502.12799 preprint EN arXiv (Cornell University) 2025-02-18

Breaking barriers in 3D point cloud data processing: A unified system for efficient storage and high-throughput loading

OPENALEX - Publications

Cong Wang Yang Luo Ke Wang Yanfei Cao Xiangzhi Tao and 6 more

10.1016/j.eswa.2025.126983 article EN Expert Systems with Applications 2025-03-01

The Raman spectroscopy combined with selective state-space algorithm for constructing a rapid disease diagnosis model

OPENALEX - Publications

Andong Chen Jun Yu Chenjie Chang Xiaoyi Lv Xuguang Zhou and 8 more

10.1016/j.chemolab.2025.105375 article EN Chemometrics and Intelligent Laboratory Systems 2025-03-01

Adversarial temporal sentence grounding by learning from external data

OPENALEX - Publications

Tingting Han Kai Wang Jun Yu Sicheng Zhao Jianping Fan

10.1016/j.patcog.2025.111621 article EN Pattern Recognition 2025-03-01

TVTracker: Target-Adaptive Text-Guided Visual Fusion for Multi-Modal RGB-T Tracking

OPENALEX - Publications

Fang Gao Wenjie Wu Yan Jin Jingfeng Tang Hanbo Zheng and 2 more

10.1109/jiot.2025.3557564 article EN IEEE Internet of Things Journal 2025-01-01

Scaling Laws of the Structural Responses for Rc Frame Structures Under External Explosions

OPENALEX - Publications

Ruiran Li Jun Yu Xingde Zhou

10.2139/ssrn.5208776 preprint EN 2025-01-01

On the convergence and mode collapse of GAN

OPENALEX - Publications

Zhaoyu Zhang Mengyan Li Jun Yu

Generative adversarial network (GAN) is a powerful generative model. However, it suffers from several problems, such as convergence instability and mode collapse. To overcome these drawbacks, this paper presents novel architecture of GAN, which consists one generator two different discriminators. With the fact that GAN analogy minimax game, proposed follows. The (G) aims to produce realistic-looking samples fool both first discriminator (D1) rewards high scores for data distribution, while...

10.1145/3283254.3283282 article EN 2018-11-30

Real-Time Head Pose Estimation and Face Modeling From a Depth Image

OPENALEX - Publications

Changwei Luo Juyong Zhang Jun Yu Chang Wen Chen Shengjin Wang

We address the issues of 3-D head pose estimation and face modeling from a depth image. Given image, random forests are effective for estimating location orientation person's head. However, accuracy is not high enough. propose using corrected regression votes. The votes obtained by considering cooperation all trees, leading to significant improvement accuracy. Based on estimator, we present system. In our system, model generated aligning deformable image an iterative closest point (ICP)...

10.1109/tmm.2019.2903724 article EN IEEE Transactions on Multimedia 2019-03-07

Characterization of Pore Throat Size Distribution in Tight Sandstones with Nuclear Magnetic Resonance and High-Pressure Mercury Intrusion

OPENALEX - Publications

Hongjun Xu Yiren Fan Hu Falong Changxi Li Jun Yu and 2 more

Characterization of pore throat size distribution (PTSD) in tight sandstones is substantial significance for sandstone reservoirs evaluation. High-pressure mercury intrusion (HPMI) and nuclear magnetic resonance (NMR) are the effective methods characterizing PTSD reservoirs. NMR T2 spectra usually converted to capillary pressure characterization. However, conversion challenging due tiny sizes. In this paper, linear method nonlinear investigated, error minimization least square proposed...

10.3390/en12081528 article EN cc-by Energies 2019-04-23

Emotional Deep Learning Programming Controller for Automatic Voltage Control of Power Systems

OPENALEX - Publications

Linfei Yin Chenwei Zhang Yaoxiong Wang Fang Gao Jun Yu and 1 more

In recent years, the rapid development of artificial intelligence, especially deep learning technology, makes machine have application scenarios in fields power system stability analysis, coordination along with scheduling and load forecasting. This paper designs an emotional programming controller (EDLPC) for automatic voltage control systems. The designed EDLPC contains neural network (EDNN) structure Q-learning algorithm. Besides, a specially defined proportional-integral-derivative (PID)...

10.1109/access.2021.3060620 article EN cc-by IEEE Access 2021-01-01

Cross-modal Target Retrieval for Tracking by Natural Language

OPENALEX - Publications

Yihao Li Jun Yu Zhongpeng Cai Yuwen Pan

Tracking by natural language specification in a video is challenging task computer vision. Distinct from initializing the target state only bounding box first frame, has strong potential to assist visual object trackers capture appearance variation and eliminate semantic ambiguity of tracked object. In this paper, we carefully design unified local-global-search framework perspective cross-modal retrieval, including local tracker, an adaptive retrieval switch module, target-specific module....

10.1109/cvprw56347.2022.00540 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Coming Soon ...