- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Hand Gesture Recognition Systems
- Gait Recognition and Analysis
- Speech and dialogue systems
- Impact of Light on Environment and Health
- Advanced Neural Network Applications
- Image Enhancement Techniques
- Advanced Image and Video Retrieval Techniques
- Visual Attention and Saliency Detection
- Face recognition and analysis
- Remote Sensing and LiDAR Applications
- IoT-based Smart Home Systems
- Natural Language Processing Techniques
- Diabetic Foot Ulcer Assessment and Management
- Retinal Imaging and Analysis
- Gaze Tracking and Assistive Technology
- Multimodal Machine Learning Applications
- Hearing Impairment and Communication
- Semantic Web and Ontologies
- Fire Detection and Safety Systems
- 3D Surveying and Cultural Heritage
- Human-Animal Interaction Studies
Shenzhen Technology University
2025
Shenzhen Institutes of Advanced Technology
2024-2025
Shenzhen University
2025
Tianjin University
2019-2024
City University of Hong Kong
2024
Chinese Academy of Sciences
2024
City University of Hong Kong, Shenzhen Research Institute
2024
State Administration of Cultural Heritage
2017-2022
University of California, Berkeley
2022
Berkeley College
2022
With a good balance between tracking accuracy and speed, correlation filter (CF) has become one of the best object frameworks, based on which many successful trackers have been developed. Recently, spatially regularized CF (SRDCF) developed to remedy annoying boundary effects tracking, thus further boosting performance. However, SRDCF uses fixed spatial regularization map constructed from loose bounding box its performance inevitably degrades when target or background show significant...
Spatial regularization (SR) is known as an effective tool to alleviate the boundary effect of correlation filter (CF), a successful visual object tracking scheme, from which number state-of-the-art trackers can be stemmed. Nevertheless, SR highly increases optimization complexity CF and its target-driven nature makes spatially-regularized may easily lose occluded targets or surrounded by other similar objects. In this paper, we propose selective spatial (SSR) for CF-tracking scheme. It...
Gait recognition, a long-distance biometric technology, has aroused intense interest recently. Currently, the two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes skeletons, respectively. However, methods greatly affected by clothes-changing carrying conditions, while model-based limited accuracy of pose estimation. To tackle this challenge, simple yet effective two-branch network is proposed in paper, contains CNN-based branch...
With a good balance between accuracy and speed, correlation filter (CF) has become popular dominant visual object tracking scheme. It implicitly extends the training samples by circular shifts of given target patch, which serve as negative for fast online learning filters. Since all these shifted patches are not real target, CF scheme suffers from annoying boundary effects that can greatly harm performance, especially under challenging situations, like occlusion temporal variation. Spatial...
Multi-view Multi-human association and tracking (MvMHAT) aims to track a group of people over time in each view, as well identify the same person across different views at time. This is relatively new problem but very important for multi-person scene video surveillance. Different from previous multiple object (MOT) multi-target multi-camera (MTMCT) tasks, which only consider over-time human association, MvMHAT requires jointly achieve both cross-view data association. In this paper, we model...
Spatial regularization (SR), being an effective tool to alleviate the boundary effects, can significantly improve accuracy and robustness of correlation filters (CF) based visual object tracking. The core SR is a spatially variant weight map that used regularize online learned by selecting more meaningful samples. However, most existing trackers apply data-independent map. In this paper, we show content-related spatial (CRSR) help further boost both tracking robustness. Specifically, present...
The global trajectories of targets on ground can be well captured from a top view in high altitude, e.g., by drone-mounted camera, while their local detailed appearances better recorded horizontal views, helmet camera worn person. This paper studies new problem multiple human tracking pair top- and horizontal-view videos taken at the same time. Our goal is to track humans both views identify person across two complementary frame frame, which very challenging due large field difference. In...
Compared to a single fixed camera, multiple moving cameras, e.g., those worn by people, can better capture the human interactive and group activities in scene, providing multiple, flexible possibly complementary views of involved people. In this setting actual promotion activity detection is highly dependent on effective correlation collaborative analysis videos taken different wearable which challenging given time-varying view differences across cameras mutual occlusion people each video....
Sign Language Recognition (SLR) translates sign language video into natural language. In practice, video, owning a large number of redundant frames, is necessary to be selected the essential. However, unlike common that describes actions, characterized as continuous and dense action sequence, which difficult capture key actions corresponding meaningful sentence. this paper, we propose hierarchically search by pyramid BiLSTM. Specifically, first construct three BiL-STMs produce temporal...
Crowded scene surveillance can significantly benefit from combining egocentric-view and its complementary top-view cameras. A typical setting is an camera, e.g., a wearable camera on the ground capturing rich local details, drone-mounted one high altitude providing global picture of scene. To collaboratively analyze such complementary-view videos, important task to associate track multiple people across views over time, which challenging differs classical human tracking, since we need not...
Identifying the same persons across different views plays an important role in many vision applications. In this paper, we study problem, denoted as Multi-view Multi-Human Association (MvMHA), on multi-view images that are taken by cameras at time. Different from previous works human association two views, paper is focused more general and challenging scenarios of than none these fixed or priorly known. addition, each involved person may be present all only a subset which also not We develop...
Fast and accurate identification of the co-interest persons, who draw joint interest surrounding people, plays an important role in social scene understanding surveillance. Previous study mainly focuses on detecting persons from a single-view video. In this paper, we much more realistic challenging problem, namely person~(CIP) detection multiple temporally-synchronized videos taken by complementary time-varying views. Specifically, use top-view camera, mounted flying drone at high altitude...
Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well identify the same person across different views at time, which from previous MOT multi-camera tasks only considering over-time human tracking. This way, videos MvMHAT require more complex annotations while containing information self-learning. In this work, we tackle with end-to-end neural...
We attempt to connect the data from complementary views, i.e., top view drone-mounted cameras in air, and side wearable on ground. Collaborative analysis of such complementary-view can facilitate build air-ground cooperative visual system for various kinds applications. This is a very challenging problem due large difference between views. In this paper, we develop new approach that simultaneously handle three tasks: i) localizing side-view camera view; ii) estimating direction camera; iii)...
Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view wearable on ground. Collaborative analysis of different-view data facilitate various kinds applications, such as human tracking, person identification, activity recognition. However, for collaborative analysis, first step is to associate people, referred subjects this paper, across these two views. This a very challenging problem due large...
Gait recognition is an important AI task, which has been progressed rapidly with the development of deep learning. However, existing learning based gait methods mainly focus on single domain, especially constrained laboratory environment. In this paper, we study a new problem unsupervised domain adaptive (UDA-GR), that learns identifier supervised labels from indoor scenes (source domain), and applied to outdoor wild (target domain). For purpose, develop uncertainty estimation regularization...