- Face recognition and analysis
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Face and Expression Recognition
- Biometric Identification and Security
- Generative Adversarial Networks and Image Synthesis
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Anomaly Detection Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Adversarial Robustness in Machine Learning
- Digital Media Forensic Detection
- Natural Language Processing Techniques
- Advanced Vision and Imaging
- Video Analysis and Summarization
- Visual Attention and Saliency Detection
- Autonomous Vehicle Technology and Safety
- 3D Shape Modeling and Analysis
- Topic Modeling
- Emotion and Mood Recognition
- Optical measurement and interference techniques
- Reconstructive Facial Surgery Techniques
- Artificial Intelligence in Games
- COVID-19 diagnosis using AI
China Automotive Technology and Research Center
2024
Beijing Jiaotong University
2024
State Nuclear Power Technology Company (China)
2023
National University of Singapore
2015-2021
Beijing University of Technology
2021
Systems, Applications & Products in Data Processing (United Kingdom)
2017
Nanyang Technological University
2014-2016
Beijing University of Civil Engineering and Architecture
2004
PLA Army Service Academy
2002
Learning to capture long-range relations is fundamental image/video recognition. Existing CNN models generally rely on increasing depth model such which highly inefficient. In this work, we propose the "double attention block", a novel component that aggregates and propagates informative global features from entire spatio-temporal space of input images/videos, enabling subsequent convolution layers access efficiently. The designed with double mechanism in two steps, where first step gathers...
Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant recognition, which either directly extract features or first normalize profile images frontal before feature extraction, we argue that it more desirable perform both tasks jointly allow them benefit from each other. To this end, propose a Invariant Model (PIM) recognition the wild, with three distinct novelties. First, PIM novel and unified deep architecture, containing Face...
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans crowded scenes, such as group behavior analysis, person re-identification autonomous driving, etc. To this end, models need to comprehensively perceive semantic information differences between instances a multi-human image, which is recently defined parsing task. In paper, we present new large-scale database...
Synthesizing realistic profile faces is beneficial for more efficiently training deep pose-invariant models large-scale unconstrained face recognition, by augmenting the number of samples with extreme poses and avoiding costly annotation work. However, learning from synthetic may not achieve desired performance due to discrepancy betwedistributions real images. To narrow this gap, we propose a Dual-Agent Generative Adversarial Network (DA-GAN) model, which can improve realism simulator's...
Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages still remains a big challenge. The appearance of human changes substantially over time, resulting significant intraclass variations. As opposed to current techniques for ageinvariant recognition, which either directly extract features or first synthesize that matches target age before feature extraction, we argue it is more desirable perform both tasks jointly so they can leverage...
In this paper, we present a Self-Supervised Neural Aggregation Network (SS-NAN) for human parsing. SS-NAN adaptively learns to aggregate the multi-scale features at each pixel "address". order further improve feature discriminative capacity, self-supervised joint loss is adopted as an auxiliary learning strategy, which imposes structures into parsing results without resorting extra supervision. The proposed end-to-end trainable. can be integrated any advanced neural networks help regarding...
In this paper, we propose a novel weakly supervised model, Multi-scale Anchored Transformer Network (MATN), to accurately localize free-form textual phrases with only image-level supervision. The proposed MATN takes region proposals as localization anchors, and learns multiscale correspondence network continuously search for phrase regions referring the anchors. way, can exploit useful cues from these anchors reliably reason about locations of described by given Through differentiable...
Learning from synthetic faces, though perhaps appealing for high data efficiency, may not bring satisfactory performance due to the distribution discrepancy of and real face images. To mitigate this gap, we propose a 3D-Aided Deep Pose-Invariant Face Recognition Model (3D-PIM), which automatically recovers realistic frontal faces arbitrary poses through 3D model in novel way. Specifically, 3D-PIM incorporates simulator with aid Morphable (3D MM) obtain shape appearance prior accelerating...
Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human by introducing problem multi-human in wild. Existing works on mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address problem, introduce a new (MHP) dataset novel model named MH-Parser. The MHP contains captured scenes pixel-level fine-grained semantic annotations...
Most existing detection pipelines treat object proposals independently and predict bounding box locations classification scores over them separately. However, the important semantic spatial layout correlations among are often ignored, which actually useful for more accurate detection. In this paper, we propose a new EM-like group recursive learning approach to iteratively refine by incorporating such context of surrounding provide an optimal configuration detections. addition, incorporate...
This paper describes our proposed method targeting at the MSR Image Recognition Challenge MS-Celeb-1M. The challenge is to recognize one million celebrities from their face images captured in real world. provides a large scale dataset crawled Web, which contains number of with many for each subject. Given new testing image, requires an identify image and corresponding confidence score. To complete challenge, we propose two-stage approach consisting data cleaning multi-view deep...
This paper presents our solution submitted to the Emotion Recognition in Wild (EmotiW 2016) group-level happiness intensity prediction sub-challenge. The objective of this sub-challenge is predict overall level given an image a group people natural setting. We note that both global setting and faces individuals influence image. Hence challenge lies building incorporates these factors also considers their right combination. Our proposed as combination local information. use convolutional...
In this paper, we address the challenging unconstrained set-based face recognition problem where each subject is instantiated by a set of media (images and videos) instead single image. Naively aggregating information from all within would suffer large intra-set variance caused heterogeneous factors (e.g., varying modalities, poses illumination) fail to learn discriminative representations. A novel Multi-Prototype Network (MP- Net) model thus proposed multiple prototype representations...
Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of is the computational overhead introduced by generation To overcome this limitation, based on single-step attacks been explored. Previous work improves from different perspectives, e.g., sample initialization, loss regularization, and strategy. Almost all them treat underlying model as a black box. In work, we propose to exploit interior blocks improve efficiency. Specifically,...
In this paper, we present the Global Multimedia Deepfake Detection held concurrently with Inclusion 2024. Our aims to detect automatic image and audio-video manipulations including but not limited editing, synthesis, generation, Photoshop, _etc_. challenge has attracted 1500 teams from all over world, about 5000 valid result submission counts. We invite top 20 their solutions challenge, which 3 are awarded prizes in grand finale. of two tracks, boost research work field forgery detection....
The beautification of human photos usually requires professional editing softwares, which are difficult for most users. In this technical demonstration, we propose a deep face framework, is able to automatically modify the geometrical structure so as boost attractiveness. A learning based approach adopted capture underlying relations between facial shape and attractiveness via training Deep Beauty Predictor (DBP). Relying on pre-trained DBP, construct BeAuty SHaper (BASH) infer "flows"...
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans crowded scenes, such as group behavior analysis, person re-identification autonomous driving, etc. To this end, models need to comprehensively perceive semantic information differences between instances a multi-human image, which is recently defined parsing task. In paper, we present new large-scale database...
In this paper, we aim to build a comprehensive face detection system which provides one-stop solution various practical challenges for in realistic scenarios, e.g., detecting faces from multiple-views, with occlusions, exaggerated expressions or blurred faces. Moreover, introduce an automatic data harvest algorithm effectively improve the generalization performance of even when collecting training containing challenging patterns is difficult. particular, three critical components system,...
Human parsing is an important task in human-centric image understanding computer vision and multimedia systems. However, most existing works on human mainly tackle the single-person scenario, which deviates from real-world applications where multiple persons are present simultaneously with interaction occlusion. To address such a challenging multi-human problem, we introduce novel model named MH-Parser, uses graph-based generative adversarial to challenges of close-person occlusion parsing....