- Domain Adaptation and Few-Shot Learning
- Generative Adversarial Networks and Image Synthesis
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Chaos control and synchronization
- Video Analysis and Summarization
- Multimodal Machine Learning Applications
- Digital Media Forensic Detection
- Advanced Steganography and Watermarking Techniques
- Chaos-based Image/Signal Encryption
- Face recognition and analysis
- Advanced Image Processing Techniques
- Video Coding and Compression Technologies
- Adversarial Robustness in Machine Learning
- Image Retrieval and Classification Techniques
- Human Motion and Animation
- Advanced Memory and Neural Computing
- stochastic dynamics and bifurcation
- Image Processing Techniques and Applications
- COVID-19 diagnosis using AI
- Music and Audio Processing
- Genetics and Plant Breeding
Dalian Polytechnic University
2021-2024
Hoya (Japan)
2020-2023
Hebei Agricultural University
2021
University of Maryland, College Park
2015-2021
Hebei University of Technology
2020
MSIGHT Technologies (China)
2018-2020
Google (United States)
2017
Shanghai Jiao Tong University
2012-2015
To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering statistics of an individual layer or two consecutive layers (e.g., one to minimize reconstruction error next layer), ignoring effect propagation networks. In contrast, we argue that for a pruned network retain its predictive power, it is essential entire neuron jointly based on unified goal: minimizing important responses "final response layer" (FRL),...
A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad solutions for deep metric learning. In this pa-per, we general weighting framework under-standing recent functions. Our contributions are three-fold: (1) establish General Pair Weighting (GPW) framework, casts sampling problem learning into unified view pair through gradient analysis, providing powerful tool understanding functions; (2) show that with GPW, various existing...
Image manipulation detection is different from traditional semantic object because it pays more attention to tampering artifacts than image content, which suggests that richer features need be learned. We propose a two-stream Faster R-CNN network and train end-to-end detect the tampered regions given manipulated image. One of two streams an RGB stream whose purpose extract input find like strong contrast difference, unnatural boundaries, so on. The other noise leverages extracted...
We propose a two-stream network for face tampering detection. train GoogLeNet to detect artifacts in classification stream, and patch based triplet leverage features capturing local noise residuals camera characteristics as second stream. In addition, we use two different online swaping applications create new dataset that consists of 2010 tampered images, each which contains face. evaluate the proposed on our newly collected dataset. Experimental results demonstrate effectness method.
The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. particular, introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches sizes local inconsistencies in images spatial levels. M2TR further learns forgery frequency domain complement RGB information through carefully designed cross...
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. We first fine-tune GoogleNet by jointly modeling clothing images and their corresponding descriptions in a visual-semantic embedding space. Then, for each attribute (word), we generate its spatiallyaware representation combining semantic word vector with spatial derived the convolutional maps of fine-tuned network. The resulting representations are further...
Fine-grained image categorization is challenging due to the subtle inter-class differences. We posit that exploiting rich relationships between channels can help capture such differences since different correspond semantics. In this paper, we propose a channel interaction network (CIN), which models channel-wise interplay both within an and across images. For single image, self-channel (SCI) module proposed explore correlation image. This allows model learn complementary features from...
Recently, person re-identification (Re-ID) has achieved great progress. However, current methods largely depend on color appearance, which is not reliable when a changes the clothes. Cloth-changing Re-ID challenging since pedestrian images with clothes change exhibit large intra-class variation and small inter-class variation. Some significant features for identification are embedded in unobvious body shape differences across pedestrians. To explore such cues cloth-changing Re-ID, we propose...
Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives research tampering detection. In this paper, we propose ObjectFormer detect and localize manipulations. To capture subtle manipulation traces that are no longer visible RGB domain, extract high-frequency features images combine them with as multimodal patch embeddings. Additionally, use a set learnable object prototypes mid-level representations model object-level...
Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability advanced photo editing software have resulted in large quantities being shared on internet. While intent behind such manipulations varies widely, concerns spread false news misinformation is growing. Current state art methods for detecting these suffers from lack training data due to laborious labeling process. We address this problem paper, which we...
In this paper, we propose a partition-masked Convolution Neural Network (CNN) to achieve compressed-video enhancement for the state-of-the-art coding standard, High Efficiency Video Coding (HECV). More precisely, our method utilizes partition information produced by encoder guide quality process. contrast existing CNN-based approaches, which only take decoded frame as input CNN, proposed approach considers unit (CU) size and combines it with distorted such that degradation introduced HEVC is...
Training an object detector on a data-rich domain and applying it to data-poor one with limited performance drop is highly attractive in industry, because saves huge annotation cost. Recent research unsupervised adaptive detection has verified that aligning data distributions between source target images through adversarial learning very useful. The key when, where how use achieve best practice. We propose Image-Instance Full Alignment Networks (iFAN) tackle this problem by precisely feature...
This paper addresses neural network based post-processing for the state-of-the-art video coding standard, High Efficiency Video Coding (HEVC). We first propose a partition-aware Convolution Neural Network (CNN) that utilizes partition information produced by encoder to assist in post-processing. In contrast existing CNN-based approaches, which only take decoded frame as input, proposed approach considers unit (CU) size and combines it with distorted such artifacts introduced HEVC are...
In this paper, a discrete model of memristor is adopted and analyzed. The new maps are built by introducing into two-dimensional map. Interestingly, from different locations can lead to two chaotic map models. dynamical behaviors the studied means bifurcation diagrams, phase diagrams Lyapunov exponential spectra (LEs). simulation results show that both systems have rich behaviors. addition, they experimentally found multi-stable properties, where M-XM has infinite attractors coexistence....
A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad solutions for deep metric learning. In this paper, we general weighting framework understanding recent functions. Our contributions are three-fold: (1) establish General Pair Weighting (GPW) framework, casts sampling problem learning into unified view pair through gradient analysis, providing powerful tool functions; (2) show that with GPW, various existing methods can be...
Image manipulation detection is different from traditional semantic object because it pays more attention to tampering artifacts than image content, which suggests that richer features need be learned. We propose a two-stream Faster R-CNN network and train endto- end detect the tampered regions given manipulated image. One of two streams an RGB stream whose purpose extract input find like strong contrast difference, unnatural boundaries, so on. The other noise leverages extracted...
Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting train large semantic concept bank priori. Given text description of an event, performed by selecting concepts linguistically related the and fusing responses on unseen videos. However, defining exhaustive lexicon pre-training it requires vast computational resources. Therefore, recent approaches automate discovery leveraging amounts weakly annotated web...
Clothed human reconstruction is the cornerstone for creating virtual world. To a great extent, quality of recovered avatars decides whether Metaverse passing fad. In this work, we introduce CLOTH4D, clothed dataset containing 1,000 subjects with varied appearances, 3D outfits, and over 100,000 meshes paired unclothed humans, to fill gap in large-scale high-quality 4D clothing data. It enjoys appealing characteristics: 1) Accurate detailed textured meshes-all items are manually created then...
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. We first fine-tune GoogleNet by jointly modeling clothing images and their corresponding descriptions in a visual-semantic embedding space. Then, for each attribute (word), we generate its representation combining semantic word vector with spatial derived the convolutional maps of fine-tuned network. The resulting representations are further used to cluster...
On-the-fly video retrieval using Web images and fast Fisher Vector products (VRFP) is a real-time framework based on short text input queries, which obtains weakly labeled training from the after query known. The retrieved representing each database are treated as unordered collections of images, collection represented single built CNN features. Our experiments show that robust to noise present in compares favorably terms accuracy other standard representations. While can be constructed...