- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Image Processing and 3D Reconstruction
- AI in cancer detection
- Machine Learning and Data Classification
- Computer Graphics and Visualization Techniques
- COVID-19 diagnosis using AI
- Industrial Vision Systems and Defect Detection
- Tribology and Wear Analysis
- CCD and CMOS Imaging Sensors
- Hand Gesture Recognition Systems
- 3D Shape Modeling and Analysis
- Complex Network Analysis Techniques
- Advanced Vision and Imaging
- Natural Language Processing Techniques
- Advanced Memory and Neural Computing
- Advanced Graph Neural Networks
- Fiber-reinforced polymer composites
University of Electronic Science and Technology of China
2025
Mianyang Central Hospital
2025
Google (United States)
2024
Wuhan Institute of Technology
2024
Massachusetts Institute of Technology
2018-2022
Moscow Institute of Thermal Technology
2020-2021
IIT@MIT
2020
Donghua University
2018-2020
Chinese University of Hong Kong
2014-2016
Shenzhen Institutes of Advanced Technology
2015
Contrastive learning applied to self-supervised representation has seen a resurgence in recent years, leading state of the art performance unsupervised training deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional losses such as triplet, max-margin and N-pairs loss. In this work, we extend approach fully-supervised setting, allowing us effectively leverage label information. Clusters points belonging same class are pulled together embedding...
Recent deep learning approaches for representation on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose strategy to overcome those. In particular, the range "neighboring" nodes that node's draws from strongly depends graph structure, analogous spread random walk. To adapt local tasks, we explore an architecture -- jumping knowledge (JK) networks flexibly leverages, each node, different ranges enable better structure-aware...
Humans view the world through many sensory channels, e.g., long-wavelength light channel, viewed by left eye, or high-frequency vibrations heard right ear. Each is noisy and incomplete, but important factors, such as physics, geometry, semantics, tend to be shared between all views (e.g., a "dog" can seen, heard, felt). We investigate classic hypothesis that powerful representation one models view-invariant factors. study this under framework of multiview contrastive learning, where we learn...
This paper demonstrates accurate human pose estimation through walls and occlusions. We leverage the fact that wireless signals in WiFi frequencies traverse reflect off body. introduce a deep neural network approach parses such radio to estimate 2D poses. Since humans cannot annotate signals, we use state-of-the-art vision model provide cross-modal supervision. Specifically, during training system uses synchronized visual inputs, extracts information from stream, it guide process. Once...
Often we wish to transfer representational knowledge from one neural network another. Examples include distilling a large into smaller one, transferring sensory modality second, or ensembling collection of models single estimator. Knowledge distillation, the standard approach these problems, minimizes KL divergence between probabilistic outputs teacher and student network. We demonstrate that this objective ignores important structural This motivates an alternative by which train capture...
Recent advances in pedestrian detection are attained by transferring the learned features of Convolutional Neural Network (ConvNet) to pedestrians. This ConvNet is typically pre-trained with massive general object categories (e.g. ImageNet). Although these able handle variations such as poses, viewpoints, and lightings, they may fail when images complex occlusions present. Occlusion handling one most important problem detection. Unlike previous deep models that directly a single detector for...
Contrastive learning between multiple views of the data has recently achieved state art performance in field self-supervised representation learning. Despite its success, influence different view choices been less studied. In this paper, we use theoretical and empirical analysis to better understand importance selection, argue that should reduce mutual information (MI) while keeping task-relevant intact. To verify hypothesis, devise unsupervised semi-supervised frameworks learn effective by...
Deep learning methods have achieved great successes in pedestrian detection, owing to its ability learn discriminative features from raw pixels. However, they treat detection as a single binary classification task, which may confuse positive with hard negative samples (Fig.1 (a)). To address this ambiguity, work jointly optimize semantic tasks, including attributes (e.g. `carrying backpack') and scene `vehicle', `tree', `horizontal'). Rather than expensively annotating attributes, we...
In this paper, we propose deformable deep convolutional neural networks for generic object detection.This new learning detection framework has innovations in multiple aspects.In the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint and penalty.A pre-training strategy is to learn feature representations more suitable task good generalization capability.By changing net structures, training strategies, adding removing some key...
This paper introduces RF-Pose3D, the first system that infers 3D human skeletons from RF signals. It requires no sensors on body, and works with multiple people across walls occlusions. Further, it generates dynamic follow as they move, walk or sit. As such, RF-Pose3D provides a significant leap in RF-based sensing enables new applications gaming, healthcare, smart homes.
In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection. The SDN automatically learns hierarchical features, salience maps, and mixture representations of different body parts. Pedestrian detection faces the challenges background clutter large variations appearance due to pose viewpoint changes other factors. One our key contributions is Restricted Boltzmann Machine (SRBM) explicitly model complex visual at multiple levels. At feature levels, it estimates saliency...
In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new learning detection framework has innovations in multiple aspects. the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint and penalty. A pre-training strategy is to learn feature representations more suitable task good generalization capability. By changing net structures, training strategies, adding removing some...
Falls are the top reason for fatal and non-fatal injuries among seniors. Existing solutions based on wearable fall-alert sensors, but medical research has shown that they ineffective, mostly because seniors do not wear them. These revelations have led to new passive sensors infer falls by analyzing Radio Frequency (RF) signals in homes. Seniors can go about their lives as usual without need any device. While monitoring made major advances, current approaches still cannot deal with...
In this paper, we propose multi-stage and deformable deep convolutional neural networks for object detection. This new learning detection diagram has innovations in multiple aspects. the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint penalty. With training strategy, classifiers are jointly optimized to process samples at different difficulty levels. A pre-training strategy is learn feature representations more suitable...
In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new learning detection framework has innovations in multiple aspects. the proposed architecture, a deformation constrained pooling (def-pooling) layer models of parts with geometric constraint and penalty. A pre-training strategy is to learn feature representations more suitable task good generalization capability. By changing net structures, training strategies, adding removing some...
The focus of recent meta-learning research has been on the development learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot is widely used as one standard benchmarks in meta-learning. In this work, we show a simple baseline: supervised or self-supervised representation meta-training set, followed by training linear classifier top representation, outperforms state-of-the-art few-shot methods. An additional boost be achieved...
The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation thus introduced to assist the training transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into influence models biases knowledge (e.g., convolution and involution). Our key observation teacher accuracy not dominant reason for student accuracy, but important. We demonstrate lightweight different...
Contrastive Language-Image Pre-training (CLIP) stands as one of the most effective and scalable methods for training transferable vision models using paired image text data. CLIP are trained contrastive loss, which typically relies on data augmentations to prevent overfitting shortcuts. However, in paradigm, exclusively applied inputs, while language inputs remain unchanged throughout entire process, limiting exposure diverse texts same image. In this paper, we introduce Language augmented...