Jin Yuan

ORCID: 0000-0002-9600-7789
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Image Enhancement Techniques
  • Industrial Vision Systems and Defect Detection
  • Face recognition and analysis
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Infrared Target Detection Methodologies
  • Advanced Chemical Sensor Technologies
  • Generative Adversarial Networks and Image Synthesis
  • Vehicle License Plate Recognition
  • Remote-Sensing Image Classification
  • Video Analysis and Summarization
  • Gait Recognition and Analysis
  • Image Processing Techniques and Applications
  • Color Science and Applications
  • Cryptographic Implementations and Security
  • Text and Document Classification Technologies
  • Asthma and respiratory diseases
  • AI in cancer detection
  • Advanced Computing and Algorithms

Hunan University
2007-2024

University of Science and Technology of China
2021

Zhongshan Ophthalmic Center, Sun Yat-sen University
2019

Xidian University
2018

Chengdu Institute of Biology
2005

Lanzhou Army General Hospital
2001

Visible-infrared object detection has attracted increasing attention recently due to its superior performance and cost-efficiency. Most existing methods focus on the of strictly-aligned data, significantly limiting practical applications. Although several researchers have attempted explore weakly-aligned visible-infrared detection, they are limited small translational deviations suffer from a low speed. This paper first explores non-aligned visibleinfrared with complex in translation,...

10.1109/tiv.2024.3393015 article EN IEEE Transactions on Intelligent Vehicles 2024-01-01

10.1016/j.cviu.2021.103172 article EN Computer Vision and Image Understanding 2021-02-05

The attention mechanism has been established as an effective method for generating caption words in image captioning; it explores one noticed subregion to predict a related word. However, even though the could offer accurate subregions train model, learned captioner may wrong, especially visual concept words, which are most important parts understand image. To tackle preceding problem, this article we propose Visual Concept Enhanced Captioner, employs joint with samples strengthen prediction...

10.1145/3394955 article EN ACM Transactions on Multimedia Computing Communications and Applications 2020-07-05

Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which aims at predicting attributes from a single 2D image. Most existing transformer-based leverage both visual and depth representations to explore valuable query points on objects, the quality of learned has great impact accuracy. Unfortunately, unsupervised attention mechanisms in transformers are prone generate low-quality features due inaccurate receptive fields, especially hard...

10.1109/tiv.2023.3311949 article EN IEEE Transactions on Intelligent Vehicles 2023-09-05

The cross-domain image captioning, which is trained on a source domain and generalized to other domains, usually faces the large shift problem. Although prior work has attempted leverage both paired unpaired target data minimize this shift, performance still unsatisfactory. One main reason lies in discrepancy language expression between two where diverse styles are adopted describe an from different views, resulting semantic descriptions for image. To tackle problem, paper proposes...

10.1109/tip.2022.3145158 article EN IEEE Transactions on Image Processing 2022-01-01

In the last few years, enormous strides have been made for object detection and data association, which are vital subtasks one-stage online multi-object tracking (MOT). However, two separated submodules involved in whole MOT pipeline processed or optimized separately, resulting a complex method design requiring manual settings. addition, works integrate into single end-to-end network to optimize overall task. this study, we propose an called joint association (JDAN) that is trained inferred...

10.1145/3533253 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-05-02

Video captioning is to generate descriptions of videos. Most existing approaches adopt the encoder-decoder architecture, which usually use different kinds visual features, such as temporal features and motion but they neglect abundant semantic information in video. To address this issue, we propose a framework that jointly explores attributions named Semantic Guiding Long Short-Term Memory (SG-LSTM). The proposed SG-LSTM has two guiding layers, both them three types - global semantic, object...

10.1109/bigmm.2018.8499357 article EN 2018-09-01

Single-label facial expression recognition (FER), which aims to classify single for images, usually suffers from the label noisy and incomplete problem, where manual annotations partial training images exist wrong or labels, resulting in performance decline. Although prior work has attempted leverage external sources handle this it requires extra costs. This article explores a simple yet effective three-phase paradigm (“warm-up,” “selection,” “relabeling”) FER task. First, warm-up phase...

10.1145/3570329 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-11-17

10.1016/j.jvcir.2023.103828 article EN Journal of Visual Communication and Image Representation 2023-04-23

The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due its low tolerance for errors well the complex sequence and context correlations between markup rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment'' (FSA-CDM), which introduces contrastive positive/negative samples into diffusion boost performance generation. Technically, we design fine-grained...

10.1145/3581783.3613781 article EN 2023-10-26

10.1016/j.jvcir.2021.103107 article EN Journal of Visual Communication and Image Representation 2021-04-09

Inverse halftoning and image expanding refer to problems restore the pixel values of images from compressed smaller bit depth. Since these two are ill-posed, there few perfect solutions. Recently, deep convolutional neural networks (DCNN) have shown their powerful ability in inverse expanding. However, restored still suffer visual artifacts or fine details loss due improper design network structure. To this end, paper proposes a residual learning model for The whole consists progressive...

10.1109/access.2019.2955025 article EN cc-by IEEE Access 2019-11-21

10.1016/j.jvcir.2018.11.028 article EN Journal of Visual Communication and Image Representation 2018-11-19

Multi-modal deep learning methods have achieved great improvements in visual grounding; their objective is to localize text-specified objects images. Most of the existing can and classify with significant appearance differences but suffer from misclassification problem for extremely similar objects, due inadequate exploration multi-modal features. To address this problem, we propose a novel semantic-aligned cross-modal grounding network transformers (SAC-VGNet). SAC-VGNet integrates textual...

10.3390/app13095649 article EN cc-by Applied Sciences 2023-05-04
Coming Soon ...