Ruize Han

ORCID: 0000-0002-6587-8936
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Human Pose and Action Recognition
  • Anomaly Detection Techniques and Applications
  • Advanced Vision and Imaging
  • Hand Gesture Recognition Systems
  • Gait Recognition and Analysis
  • Speech and dialogue systems
  • Impact of Light on Environment and Health
  • Advanced Neural Network Applications
  • Image Enhancement Techniques
  • Advanced Image and Video Retrieval Techniques
  • Visual Attention and Saliency Detection
  • Face recognition and analysis
  • Remote Sensing and LiDAR Applications
  • IoT-based Smart Home Systems
  • Natural Language Processing Techniques
  • Diabetic Foot Ulcer Assessment and Management
  • Retinal Imaging and Analysis
  • Gaze Tracking and Assistive Technology
  • Multimodal Machine Learning Applications
  • Hearing Impairment and Communication
  • Semantic Web and Ontologies
  • Fire Detection and Safety Systems
  • 3D Surveying and Cultural Heritage
  • Human-Animal Interaction Studies

Shenzhen Technology University
2025

Shenzhen Institutes of Advanced Technology
2024-2025

Shenzhen University
2025

Tianjin University
2019-2024

City University of Hong Kong
2024

Chinese Academy of Sciences
2024

City University of Hong Kong, Shenzhen Research Institute
2024

State Administration of Cultural Heritage
2017-2022

University of California, Berkeley
2022

Berkeley College
2022

With a good balance between tracking accuracy and speed, correlation filter (CF) has become one of the best object frameworks, based on which many successful trackers have been developed. Recently, spatially regularized CF (SRDCF) developed to remedy annoying boundary effects tracking, thus further boosting performance. However, SRDCF uses fixed spatial regularization map constructed from loose bounding box its performance inevitably degrades when target or background show significant...

10.1109/tip.2019.2895411 article EN IEEE Transactions on Image Processing 2019-01-25

Spatial regularization (SR) is known as an effective tool to alleviate the boundary effect of correlation filter (CF), a successful visual object tracking scheme, from which number state-of-the-art trackers can be stemmed. Nevertheless, SR highly increases optimization complexity CF and its target-driven nature makes spatially-regularized may easily lose occluded targets or surrounded by other similar objects. In this paper, we propose selective spatial (SSR) for CF-tracking scheme. It...

10.1109/tip.2019.2955292 article EN IEEE Transactions on Image Processing 2019-11-28

Gait recognition, a long-distance biometric technology, has aroused intense interest recently. Currently, the two dominant gait recognition works are appearance-based and model-based, which extract features from silhouettes skeletons, respectively. However, methods greatly affected by clothes-changing carrying conditions, while model-based limited accuracy of pose estimation. To tackle this challenge, simple yet effective two-branch network is proposed in paper, contains CNN-based branch...

10.1109/icassp49357.2023.10096986 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

With a good balance between accuracy and speed, correlation filter (CF) has become popular dominant visual object tracking scheme. It implicitly extends the training samples by circular shifts of given target patch, which serve as negative for fast online learning filters. Since all these shifted patches are not real target, CF scheme suffers from annoying boundary effects that can greatly harm performance, especially under challenging situations, like occlusion temporal variation. Spatial...

10.1109/tip.2020.2998978 article EN IEEE Transactions on Image Processing 2020-01-01

Multi-view Multi-human association and tracking (MvMHAT) aims to track a group of people over time in each view, as well identify the same person across different views at time. This is relatively new problem but very important for multi-person scene video surveillance. Different from previous multiple object (MOT) multi-target multi-camera (MTMCT) tasks, which only consider over-time human association, MvMHAT requires jointly achieve both cross-view data association. In this paper, we model...

10.1145/3474085.3475177 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

10.26599/cvm.2025.9450390 article EN cc-by Computational Visual Media 2025-01-01

Spatial regularization (SR), being an effective tool to alleviate the boundary effects, can significantly improve accuracy and robustness of correlation filters (CF) based visual object tracking. The core SR is a spatially variant weight map that used regularize online learned by selecting more meaningful samples. However, most existing trackers apply data-independent map. In this paper, we show content-related spatial (CRSR) help further boost both tracking robustness. Specifically, present...

10.1109/icme.2018.8486487 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2018-07-01

The global trajectories of targets on ground can be well captured from a top view in high altitude, e.g., by drone-mounted camera, while their local detailed appearances better recorded horizontal views, helmet camera worn person. This paper studies new problem multiple human tracking pair top- and horizontal-view videos taken at the same time. Our goal is to track humans both views identify person across two complementary frame frame, which very challenging due large field difference. In...

10.1609/aaai.v34i07.6724 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Compared to a single fixed camera, multiple moving cameras, e.g., those worn by people, can better capture the human interactive and group activities in scene, providing multiple, flexible possibly complementary views of involved people. In this setting actual promotion activity detection is highly dependent on effective correlation collaborative analysis videos taken different wearable which challenging given time-varying view differences across cameras mutual occlusion people each video....

10.1145/3394171.3413903 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Sign Language Recognition (SLR) translates sign language video into natural language. In practice, video, owning a large number of redundant frames, is necessary to be selected the essential. However, unlike common that describes actions, characterized as continuous and dense action sequence, which difficult capture key actions corresponding meaningful sentence. this paper, we propose hierarchically search by pyramid BiLSTM. Specifically, first construct three BiL-STMs produce temporal...

10.1109/icassp40776.2020.9054316 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Crowded scene surveillance can significantly benefit from combining egocentric-view and its complementary top-view cameras. A typical setting is an camera, e.g., a wearable camera on the ground capturing rich local details, drone-mounted one high altitude providing global picture of scene. To collaboratively analyze such complementary-view videos, important task to associate track multiple people across views over time, which challenging differs classical human tracking, since we need not...

10.1109/tpami.2021.3070562 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

Identifying the same persons across different views plays an important role in many vision applications. In this paper, we study problem, denoted as Multi-view Multi-Human Association (MvMHA), on multi-view images that are taken by cameras at time. Different from previous works human association two views, paper is focused more general and challenging scenarios of than none these fixed or priorly known. addition, each involved person may be present all only a subset which also not We develop...

10.1109/tip.2021.3139178 article EN IEEE Transactions on Image Processing 2022-01-01

Fast and accurate identification of the co-interest persons, who draw joint interest surrounding people, plays an important role in social scene understanding surveillance. Previous study mainly focuses on detecting persons from a single-view video. In this paper, we much more realistic challenging problem, namely person~(CIP) detection multiple temporally-synchronized videos taken by complementary time-varying views. Specifically, use top-view camera, mounted flying drone at high altitude...

10.1145/3394171.3413659 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Multi-view multi-human association and tracking (MvMHAT), is an emerging yet important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well identify the same person across different views at time, which from previous MOT multi-camera tasks only considering over-time human tracking. This way, videos MvMHAT require more complex annotations while containing information self-learning. In this work, we tackle with end-to-end neural...

10.1109/tpami.2024.3463966 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-01-01

10.1109/cvpr52733.2024.00088 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

We attempt to connect the data from complementary views, i.e., top view drone-mounted cameras in air, and side wearable on ground. Collaborative analysis of such complementary-view can facilitate build air-ground cooperative visual system for various kinds applications. This is a very challenging problem due large difference between views. In this paper, we develop new approach that simultaneously handle three tasks: i) localizing side-view camera view; ii) estimating direction camera; iii)...

10.1109/cvpr52688.2022.00245 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view wearable on ground. Collaborative analysis of different-view data facilitate various kinds applications, such as human tracking, person identification, activity recognition. However, for collaborative analysis, first step is to associate people, referred subjects this paper, across these two views. This a very challenging problem due large...

10.48550/arxiv.1907.11458 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Gait recognition is an important AI task, which has been progressed rapidly with the development of deep learning. However, existing learning based gait methods mainly focus on single domain, especially constrained laboratory environment. In this paper, we study a new problem unsupervised domain adaptive (UDA-GR), that learns identifier supervised labels from indoor scenes (source domain), and applied to outdoor wild (target domain). For purpose, develop uncertainty estimation regularization...

10.48550/arxiv.2211.11155 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...