- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Digital Media Forensic Detection
- Emotion and Mood Recognition
- Image Retrieval and Classification Techniques
- Speech Recognition and Synthesis
- Medical Image Segmentation Techniques
- Human Pose and Action Recognition
- Advanced Steganography and Watermarking Techniques
- Smart Agriculture and AI
- Speech and Audio Processing
- Video Surveillance and Tracking Methods
- Robotics and Sensor-Based Localization
- Visual Attention and Saliency Detection
- 3D Surveying and Cultural Heritage
- Multimodal Machine Learning Applications
- Cell Image Analysis Techniques
- Gait Recognition and Analysis
- IoT and Edge/Fog Computing
- Face recognition and analysis
- Handwritten Text Recognition Techniques
- Gaze Tracking and Assistive Technology
- Technology and Security Systems
- Image Processing Techniques and Applications
Chinese Academy of Sciences
2014-2025
Institute of Automation
2010-2025
Dalian University
2020-2024
Dalian University of Technology
2019-2022
Beijing Jiaotong University
2013-2021
Beijing Automation Control Equipment Institute
2021
National Engineering Research Center for Information Technology in Agriculture
2014-2020
Ministry of Agriculture and Rural Affairs
2020
Tianjin University of Finance and Economics
2008
Brain tumor segmentation technology plays a pivotal role in the process of diagnosis and treatment MRI brain tumors. It helps doctors to locate measure tumors, as well develop rehabilitation strategies. Recently, methods based on U-Net architecture have become popular they largely improve accuracy by applying skip connection combine high-level feature information low-level information. Meanwhile, researchers demonstrated that introducing attention mechanism into can enhance local expression...
In this paper, we focus on local image tampering detection. For a JPEG image, the probability distributions of its DCT coefficients will be disturbed by operation. The tampered region and unchanged have different distributions, which is an important clue for locating tampering. Based assumption Laplacian distribution unquantized ac coefficients, these two as well size can estimated so that each block being obtained. More accurate localization results could got when consider prior knowledge...
Composed Image Retrieval (CIR) aims to retrieve target images from candidate set using a hybrid-modality query consisting of reference image and relative caption that describes the user intent. Recent studies attempt utilize Vision-Language Pre-training Models (VLPMs) with various fusion strategies for addressing task. However, these methods typically fail simultaneously meet two key requirements CIR: comprehensively extracting visual information faithfully following In this work, we propose...
3D lighting environment is an important clue in image that can be used for forgery detection. Existing forensic methods exploring consistency are based on many assumptions, among which convexity and constant reflectance of the surface two critical ones. In this paper, we propose improved estimation method a more general reflection model. We relax assumptions by incorporating local geometry texture information into our position dependent The proposed model realistic objects like human faces...
Local Binary Pattern (LBP) based framework only uses a scalar threshold to binarize all magnitude vectors in <i>P</i> different directions around each center pixel of texture image. Hence, the original LBP-based framework, fact, can not precisely extract features pixel. Furthermore, value have dramatic changes from coarse areas flat same Therefore, using calculated whole image and simultaneously. To overcome these two drawbacks, we propose novel adaptively binarizing vector (ABMV) method....
Existing light field based works utilize either views or focal stacks for saliency detection. However, since depth information exists implicitly in adjacent different slices, it is difficult to exploit scene from both. By comparison, Epipolar Plane Images (EPIs) provide explicit accurate and occlusion by projected pixel lines. Due the fact that of an object often continuous, distribution edges concentrates more on boundaries compared with traditional color edges, which beneficial improving...
With the breakthrough performance in a variety of computer vision and medical image analysis problems, convolutional neural networks (CNNs) have been successfully introduced for classification task breast cancer histopathological images recent years. Nevertheless, existing mainly utilize first-order statistic information deep features to represent images, failing characterize complex global feature distribution images. To address problem, this work makes first attempt explore second-order...
Real-time speech emotion recognition has always been a problem. To this end, we proposed an end-to-end model based on one-dimensional convolutional neural network, which contains only three convolution layers, two pooling layers and one full-connected layer. Through Adam optimization algorithm back propagation mechanism, more discriminative features can be extracted continuously. Our is quite simple in structure easy to quickly complete the emotional classification task. Compared with...
Previous work has shown that feature maps of deep convolutional neural networks (CNNs) can be interpreted as representation an image. Image features aggregated from these have achieved steady progress in terms performances on visual instance retrieval tasks recent years. The key to the success such methods is representation. In this paper, we study how represent image using discriminative features. We demonstrate first size important factor which affects performance but not been thoroughly...
In view of the remarkable achievements convolutional neural network in field computer vision, We propose a speech emotion recognition algorithm based on convolution and feature fusion, Which extracts features from original signal its spectrogram for recognition. From point enhancement, extracted 1D-CNN 2D-CNN tivo models are fused by dimension splicing this algorithm, then sent to model again train. This Way fusion makes better use emotional information time domain frequency domain, gives...
Effective perception of the surrounding environment and balance between accuracy processing speed are crucial for successful application real-time semantic segmentation algorithm in fields autonomous driving, drones, smart security. In this paper, a lightweight feature reuse network MHANet is proposed. The main novelties our method improved ResNet attention-based fusion mechanism. And effectiveness verified by large number experiments. Without any pre-training process, performance using deep...
In the era of AIGC, fast development visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and scenes. To relieve this problem, we propose Artifact Purification Network (APN) facilitate artifact extraction images through explicit implicit purification processes. For one, a suspicious frequency-band proposal method spatial...
Composed Image Retrieval (CIR) aims to retrieve target images from candidate set using a hybrid-modality query consisting of reference image and relative caption that describes the user intent. Recent studies attempt utilize Vision-Language Pre-training Models (VLPMs) with various fusion strategies for addressing task.However, these methods typically fail simultaneously meet two key requirements CIR: comprehensively extracting visual information faithfully following In this work, we propose...