- Advanced Steganography and Watermarking Techniques
- Digital Media Forensic Detection
- Chaos-based Image/Signal Encryption
- Advanced Image and Video Retrieval Techniques
- Adversarial Robustness in Machine Learning
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Face recognition and analysis
- Advanced Neural Network Applications
- Anomaly Detection Techniques and Applications
- Image Retrieval and Classification Techniques
- Domain Adaptation and Few-Shot Learning
- Cryptography and Data Security
- Internet Traffic Analysis and Secure E-voting
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Quantum Information and Cryptography
- Video Analysis and Summarization
- Privacy-Preserving Technologies in Data
- Advanced Image Processing Techniques
- Quantum Computing Algorithms and Architecture
- Advanced Vision and Imaging
- Advanced Malware Detection Techniques
- Image Enhancement Techniques
- Face and Expression Recognition
University of Science and Technology of China
2016-2025
Chinese Academy of Sciences
2015-2024
Hefei University
2023-2024
Hefei Institutes of Physical Science
2016-2023
Fordham University
2021-2023
National Engineering Research Center of Electromagnetic Radiation Control Materials
2023
China Southern Power Grid (China)
2023
University of Alabama in Huntsville
2021
City University of Hong Kong
2020
King University
2016
We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. A challenging issue in Transformer design is that global self-attention very expensive to compute whereas local often limits the field of interactions each token. To address this issue, we develop Cross-Shaped Window mechanism computing horizontal vertical stripes parallel form a cross-shaped window, with stripe obtained by splitting input feature into equal width. provide...
Recently, more and attention is paid to reversible data hiding (RDH) in encrypted images, since it maintains the excellent property that original cover can be losslessly recovered after embedded extracted while protecting image content's confidentiality. All previous methods embed by reversibly vacating room from which may subject some errors on extraction and/or restoration. In this paper, we propose a novel method reserving before encryption with traditional RDH algorithm, thus easy for...
While neural machine translation (NMT) is making good progress in the past two years, tens of millions bilingual sentence pairs are needed for its training. However, human labeling very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled through game. This mechanism inspired by following observation: any task has dual task, e.g., English-to-French (primal) versus French-to-English (dual);...
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns. Recently, how to detect such contents become a hot research topic many detection methods have been proposed. Most of them model as vanilla binary classification problem, i.e, first use backbone network extract global feature then feed it into classifier (real/fake). But since difference between real fake images in this task often subtle local, we argue solution not optimal. In paper, instead...
We propose StyleBank, which is composed of multiple convolution filter banks and each bank explicitly represents one style, for neural image style transfer. To transfer an to a specific the corresponding operated on top intermediate feature embedding produced by single auto-encoder. The StyleBank auto-encoder are jointly learnt, where learning conducted in such way that does not encode any information thanks flexibility introduced explicit representation. It also enables us conduct...
In this paper, we propose a CNN-based framework for online MOT. This utilizes the merits of single object trackers in adapting appearance models and searching target next frame. Simply applying tracker MOT will encounter problem computational efficiency drifted results caused by occlusion. Our achieves sharing features using ROI-Pooling to obtain individual each target. Some learned target-specific CNN layers are used model framework, introduce spatial-temporal attention mechanism (STAM)...
Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels recent years. However, conventional approaches are unable to model the underlying spatial multi-label images, because annotations of generally not provided. In this paper, we propose unified deep neural network that exploits both and with only image-level supervisions. Given image, our proposed Spatial Regularization...
With the dramatically increasing deployment of Internet Things (IoT), remote monitoring health data to achieve intelligent healthcare has received great attention recently. However, due limited computing power and storage capacity IoT devices, users' are generally stored in a centralized third party, such as hospital database or cloud, make users lose control their data, which can easily result privacy leakage single-point bottleneck. In this paper, we propose Healthchain, large-scale...
The remarkable success in face forgery techniques has received considerable attention computer vision due to security concerns. We observe that up-sampling is a necessary step of most techniques, and cumulative will result obvious changes the frequency domain, especially phase spectrum. According property natural images, spectrum preserves abundant components provide extra information complement loss amplitude To this end, we present novel Spatial-Phase Shallow Learning (SPSL) method, which...
Cross-modality person re-identification (cm-ReID) is a challenging but key technology for intelligent video analysis. Existing works mainly focus on learning modality-shared representation by embedding different modalities into same feature space, lowering the upper bound of distinctiveness. In this paper, we tackle above limitation proposing novel cross-modality shared-specific transfer algorithm (termed cm-SSFT) to explore potential both information and modality-specific characteristics...
Remote data integrity checking is a crucial technology in cloud computing. Recently, many works focus on providing dynamics and/or public verifiability to this type of protocols. Existing protocols can support both features with the help third-party auditor. In previous work, Sebé et al. propose remote protocol that supports dynamics. paper, we adapt al.'s verifiability. The proposed without addition, does not leak any private information verifiers. Through formal analysis, show correctness...
Training a feed-forward network for the fast neural style transfer of images has proven successful, but naive extension processing videos frame by is prone to producing flickering results. We propose first end-to-end online video transfer, which generates temporally coherent stylized sequences in near realtime. Two key ideas include an efficient incorporating short-term coherence, and propagating coherence long-term, ensures consistency over longer period time. Our can incorporate different...
State-of-the-art schemes for reversible data hiding (RDH) usually consist of two steps: first construct a host sequence with sharp histogram via prediction errors, and then embed messages by modifying the methods, such as difference expansion shift. In this paper, we focus on second stage, propose modification method RDH, which embeds message recursively utilizing decompression compression processes an entropy coder. We prove that, independent identically distributed (i.i.d.) gray-scale...
Synthesizing photo-realistic images from text descriptions is a challenging problem. Previous studies have shown remarkable progresses on visual quality of the generated images. In this paper, we consider semantics input in helping render However, diverse linguistic expressions pose challenges extracting consistent even they depict same thing. To end, propose novel text-to-image generation model that implicitly disentangles to both fulfill high-level semantic consistency and low-level...
In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting suspect by finding inconsistency in inner outer regions. The Transformer incorporates consistency loss for determination. We show exhibits superior generalization ability not only across different datasets but also various types of image degradation forms found real-world applications including deepfake videos....
This paper explores a better prediction target for BERT pre-training of vision transformers. We observe that current targets disagree with human perception judgment. contradiction motivates us to learn perceptual target. argue perceptually similar images should stay close each other in the space. surprisingly find one simple yet effective idea: enforcing similarity during dVAE training. Moreover, we adopt self-supervised transformer model deep feature extraction and show it works well...
This paper presents a simple yet effective framework MaskCLIP, which incorporates newly proposed masked self-distillation into contrastive language-image pretraining. The core idea of is to distill representation from full image the predicted image. Such incorporation enjoys two vital benefits. First, targets local patch learning, complementary vision-language focusing on text-related representation. Second, also consistent with perspective training objective as both utilize visual encoder...
In recent years, face forgery detectors have aroused great interest and achieved impressive performance, but they are still struggling with generalization robustness. this work, we explore taking full advantage of the fine-grained traces in both spatial frequency domains to alleviate issue. Specifically, propose a novel High-Frequency Fine-Grained Transformer (F2Trans) network which contains two important components, namely Central Difference Attention (CDA) High-frequency Wavelet Sampler...
Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement digital humans. While recent works have achieved impressive results in generating directly textual descriptions, they often support only a single modality control signal, which limits their application real industry. This paper presents Motion General-Purpose generaTor (MotionGPT) that can use multimodal signals, e.g., text and single-frame poses, for...