- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Multimodal Machine Learning Applications
- Gait Recognition and Analysis
- Network Security and Intrusion Detection
- Video Surveillance and Tracking Methods
- Hand Gesture Recognition Systems
- Domain Adaptation and Few-Shot Learning
- Non-Invasive Vital Sign Monitoring
- ECG Monitoring and Analysis
- Video Analysis and Summarization
- Artificial Immune Systems Applications
- Electrical and Bioimpedance Tomography
- Advanced Image and Video Retrieval Techniques
- Tactile and Sensory Interactions
- Advanced Neural Network Applications
- COVID-19 diagnosis using AI
- Hearing Impairment and Communication
- Data-Driven Disease Surveillance
- Photoacoustic and Ultrasonic Imaging
- Context-Aware Activity Recognition Systems
- Complex Network Analysis Techniques
- Healthcare Technology and Patient Monitoring
- Generative Adversarial Networks and Image Synthesis
- Visual Attention and Saliency Detection
Northwestern Polytechnical University
2018-2024
Northwestern Polytechnic University
2023
University of Chinese Academy of Sciences
2017-2018
Institute of Automation
2016-2018
Chinese Academy of Sciences
2015-2017
Gesture is a natural interface in human-computer interaction, especially interacting with wearable devices, such as VR/AR helmet and glasses. However, the gesture recognition community, it lacks of suitable datasets for developing egocentric (first-person view) methods, particular deep learning era. In this paper, we introduce new benchmark dataset named EgoGesture sufficient size, variation, reality to be able train neural networks. This contains more than 24 000 samples 3 frames both color...
For skeleton-based action recognition, most of the existing works used recurrent neural networks. Using convolutional networks (CNNs) is another attractive solution considering their advantages in parallelization, effectiveness feature learning, and model base sufficiency. Besides these, skeleton data are low-dimensional features. It natural to arrange a sequence features chronologically into an image, which retains original information. Therefore, we solve learning problem as image...
Gesture is a natural interface in interacting with wearable devices such as VR/AR helmet and glasses. The main challenge of gesture recognition egocentric vision arises from the global camera motion caused by spontaneous head movement device wearer. In this paper, we address problem novel recurrent 3D convolutional neural network for end-to-end learning. We specially design spatiotemporal transformer module connections between neighboring time slices which can actively transform feature map...
Video anomaly detection aims to find the events in a video that do not conform expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, error is highly dependent on local context of current and lacks understanding normality. To address this issue, we propose anomalous only context, but also according consistency between testing event knowledge about normality from training data. Concretely, novel two-stream...
3-D convolutional neural networks (3-D CNNs) have been established as a powerful tool to simultaneously learn features from both spatial and temporal dimensions, which is suitable be applied video-based action recognition. In this paper, we propose not directly use the activations of fully connected layers CNN video feature, but selective layer form discriminative descriptor for video. It pools feature on under guidance body joint positions. Two schemes mapping joints into maps pooling are...
Few-shot learning is a fundamental and challenging problem since it requires recognizing novel categories from only few examples. The objects for recognition have multiple variants can locate anywhere in images. Directly comparing query images with example not handle content misalignment. representation metric comparison are critical but to learn due the scarcity wide variation of samples few-shot learning. In this paper, we present semantic alignment model compare relations, which robust We...
For weakly supervised anomaly detection, most existing work is limited to the problem of inadequate video representation due inability modeling long-term contextual information. To solve this, we propose a novel adaptive graph convolutional network (WAGCN) model complex relationship among segments. By which, fully consider influence other segments on current one when generating probability score for each segment. Firstly, combine temporal consistency as well feature similarity construct...
Video anomaly detection (VAD) mainly refers to identifying anomalous events that have not occurred in the training set where only normal samples are available. Existing works usually formulate VAD as a reconstruction or prediction problem. However, adaptability and scalability of these methods limited. In this paper, we propose novel distance-based method take advantage all available data efficiently flexibly. our method, smaller distance between testing sample samples, higher probability is...
Semi-supervised video anomaly detection (VAD) is a critical task in the intelligent surveillance system. However, an essential type of VAD named scene-dependent has not received attention researchers. Moreover, there no research investigating anticipation, more significant for preventing occurrence anomalous events. To this end, we propose new comprehensive dataset, NWPU Campus, containing 43 scenes, 28 classes abnormal events, and 16 hours videos. At present, it largest semi-supervised...
Zero-shot action recognition (ZSAR) requires collaborative multi-modal spatiotemporal understanding. However, finetuning CLIP directly for ZSAR yields suboptimal performance, given its inherent constraints in capturing essential temporal dynamics from both vision and text perspectives, especially when encountering novel actions with fine-grained discrepancies. In this work, we propose Spatiotemporal Dynamic Duo (STDD), a CLIP-based framework to comprehend synergistically. For the side, an...
With the development of sensing equipments, data from different modalities is available for gesture recognition. In this paper, we propose a novel multi-modal learning framework. A coupled hidden Markov model (CHMM) employed to discover correlation and complementary information across modalities. framework, use two configurations: one testing, where all used during are still testing; other single-modal only modality testing. Experiments on real-world recognition sets have demonstrated...
Weakly supervised video anomaly detection (WSVAD) is a challenging task since only video-level labels are available for training. In previous studies, the discriminative power of learned features not strong enough, and data imbalance resulting from mini-batch training strategy ignored. To address these two issues, we propose novel WSVAD method based on cross-batch clustering guidance. enhance features, batch loss to encourage branch generate distinct normal abnormal clusters data. Meanwhile,...
Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of named scene-dependent is overlooked. Moreover, the task video anticipation (VAA) also deserves attention. To fill these gaps, we build comprehensive dataset NWPU Campus, which largest semi-supervised VAD and first for VAA. Meanwhile, introduce novel forward-backward framework VAA, forward network individually solves jointly VAA with backward network. Particularly, propose generative...
Temporal action localization (TAL) is a prevailing task due to its great application potential. Existing works in this field mainly suffer from two weaknesses: (1) They often neglect the multi-label case and only focus on temporal modeling. (2) ignore semantic information class labels use visual information. To solve these problems, we propose novel Co-Occurrence Relation Module (CORM) that explicitly models co-occurrence relationship between actions. Besides information, it further utilizes...
Video anomaly detection aims to find the events in a video that do not conform expected behavior. The prevalent methods mainly detect anomalies by snippet reconstruction or future frame prediction error. However, error is highly dependent on local context of current and lacks understanding normality. To address this issue, we propose anomalous only context, but also according consistency between testing event knowledge about normality from training data. Concretely, novel two-stream...
In the past decades, continuous Doppler radar sensor-based bio-signal detections have attracted many research interests. A typical example is heartbeat detection. While significant progresses been achieved, reliable, time-domain accurate demodulation of bio-signals in presence unavoidable DC offsets remains a technical challenge. Aiming to overcome this difficulty, we propose paper novel algorithm that does not need trace and eliminate dynamic based on approximating segmented arcs quadrature...
Understanding human activities in video is a fundamental problem computer vision. In real life, are composed of temporal and spatial arrangement actions. such complex requires recognizing not only each individual action, but more importantly, capturing their spatio-temporal relationships. This paper addresses the activity recognition with unified hierarchical model. We expand triangular-chain CRFs (TriCRFs) to dimension. The proposed architecture can be perceived as version TriCRFs, which...
Abstract Gesture recognition has attracted considerable attention and made encouraging progress in recent years due to its great potential applications. However, the spatial temporal modeling gesture is still a problem be solved. Specifically, existing works lack efficient effective capacity. To efficiently model information, we first propose long- short-term shift module (LS-TSM) that models long-term information simultaneously. Then, (SAM) focuses on where change primarily occurs obtain In...
Three dimensional convolutional neural networks (3D CNNs) have been established as a powerful tool to simultaneously learn features from both spatial and temporal dimensions, which is suitable be applied video-based action recognition. In this work, we propose not directly use the activations of fully-connected layers 3D CNN video feature, but selective layer form discriminative descriptor for video. It pools feature on under guidance body joint positions. Two schemes mapping joints into...