Shuai Yi

ORCID: 0000-0002-1253-6633
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Domain Adaptation and Few-Shot Learning
  • Face recognition and analysis
  • Anomaly Detection Techniques and Applications
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • 3D Shape Modeling and Analysis
  • Gait Recognition and Analysis
  • Visual Attention and Saliency Detection
  • Computer Graphics and Visualization Techniques
  • 3D Surveying and Cultural Heritage
  • Privacy-Preserving Technologies in Data
  • Image Enhancement Techniques
  • Image Processing Techniques and Applications
  • Robotics and Sensor-Based Localization
  • Image Retrieval and Classification Techniques
  • Evacuation and Crowd Dynamics
  • Image Processing and 3D Reconstruction
  • Hand Gesture Recognition Systems
  • Video Analysis and Summarization
  • Handwritten Text Recognition Techniques
  • Neural Networks Stability and Synchronization

Group Sense (China)
2017-2024

Shandong Academy of Sciences
2023-2024

Qilu University of Technology
2023-2024

Shanghai Artificial Intelligence Laboratory
2021-2023

The Sense Innovation and Research Center
2018-2022

South China Normal University
2022

ShangHai JiAi Genetics & IVF Institute
2021

Chinese Academy of Sciences
2020

Chinese University of Hong Kong
2014-2020

PLA Information Engineering University
2015

Person re-identification (ReID) is an important task in video surveillance and has various applications. It non-trivial due to complex background clutters, varying illumination conditions, uncontrollable camera settings. Moreover, the person body misalignment caused by detectors or pose variations sometimes too severe for feature matching across images. In this study, we propose a novel Convolutional Neural Network (CNN), called Spindle Net, based on human region guided multi-stage...

10.1109/cvpr.2017.103 article EN 2017-07-01

Pedestrian analysis plays a vital role in intelligent video surveillance and is key component for security-centric computer vision systems. Despite that the convolutional neural networks are remarkable learning discriminative features from images, of comprehensive pedestrians fine-grained tasks remains an open problem. In this study, we propose new attentionbased deep network, named as HydraPlus-Net (HPnet), multi-directionally feeds multi-level attention maps to different feature layers....

10.1109/iccv.2017.46 preprint EN 2017-10-01

In this paper, we tackle the vehicle Re-identification (ReID) problem which is of great importance in urban surveillance and can be used for multiple applications. our ReID framework, an orientation invariant feature embedding module a spatial-temporal regularization are proposed. With embedding, local region features different orientations extracted based on 20 key point locations well aligned combined. regularization, log-normal distribution adopted to model constraints retrieval results...

10.1109/iccv.2017.49 article EN 2017-10-01

Vehicle re-identification is an important problem and has many applications in video surveillance intelligent transportation. It gains increasing attention because of the recent advances person techniques. However, unlike re-identification, visual differences between pairs vehicle images are usually subtle even challenging for humans to distinguish. Incorporating additional spatio-temporal information vital solving task. Existing methods ignored or used oversimplified models relations...

10.1109/iccv.2017.210 article EN 2017-10-01

Person re-identification (reID) is an important task that requires to retrieve a person's images from image dataset, given one of the person interest. For learning robust features, pose variation key challenges. Existing works targeting problem either perform human alignment, or learn human-region-based representations. Extra information and computational cost generally required for inference. To solve this issue, Feature Distilling Generative Adversarial Network (FD-GAN) proposed...

10.48550/arxiv.1810.02936 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Pedestrian behavior modeling and analysis is important for crowd scene understanding has various applications in video surveillance. Stationary groups are a key factor influencing pedestrian walking patterns but was largely ignored literature. In this paper, novel model proposed by including stationary as component. Through inference on the interactions between pedestrians, our can be used to investigate behaviors. The effectiveness of demonstrated through multiple applications, path...

10.1109/cvpr.2015.7298971 article EN 2015-06-01

In this paper, we address video-based person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. Our approach divides long sequences into multiple short video snippets aggregates the top-ranked similarities for sequence-similarity estimation. With strategy, intra-person visual variation of each sample could be minimized similarity estimation, while diverse appearance temporal information are maintained. The estimated by a deep neural network...

10.1109/cvpr.2018.00128 article EN 2018-06-01

Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that Softmax function, though used most classification tasks, gives a biased gradient estimation under setup. This paper presents Balanced Softmax, an elegant unbiased extension of accommodate label distribution shift testing. Theoretically, derive generalization bound for multiclass...

10.48550/arxiv.2007.10740 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Dot-product attention has wide applications in computer vision and natural language processing. However, its memory computational costs grow quadratically with the input size. Such growth prohibits application on high- resolution inputs. To remedy this drawback, paper proposes a novel efficient mechanism equivalent to dot-product but substantially less costs. Its resource efficiency allows more widespread flexible integration of modules into network, which leads better accuracies. Empirical...

10.1109/wacv48630.2021.00357 article EN 2021-01-01

Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate object detection this kind of data very challenging. In work, by intensive diagnosis experiments, we quantify the impact introduced each sub-task and found 'localization error' vital factor restricting detection. Besides, also investigate underlying reasons behind localization errors, analyze issues they might bring, propose three strategies. First, revisit misalignment between...

10.1109/cvpr46437.2021.00469 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Real-scanned point clouds are often incomplete due to viewpoint, occlusion, and noise. Existing cloud completion methods tend generate global shape skeletons hence lack fine local details. Furthermore, they mostly learn a deterministic partial-to-complete mapping, but overlook structural relations in man-made objects. To tackle these challenges, this paper proposes variational framework, Variational Relational Completion network (VRC-Net) with two appealing properties: 1) Probabilistic...

10.1109/cvpr46437.2021.00842 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Person re-identification is an important topic in intelligent surveillance and computer vision. It aims to accurately measure visual similarities between person images for determining whether two correspond the same person. State-of-the-art methods mainly utilize deep learning based approaches features describing appearances. However, we observe that existing models are biased capture too much relevance background appearances of images. We design a series experiments with newly created...

10.1109/cvpr.2018.00607 article EN 2018-06-01

Dense depth cues are important and have wide applications in various computer vision tasks. In autonomous driving, LIDAR sensors adopted to acquire measurements around the vehicle perceive surrounding environments. However, maps obtained by generally sparse because of its hardware limitation. The task completion attracts increasing attention, which aims at generating a dense map from an input map. To effectively utilize multi-scale features, we propose three novel sparsity-invariant...

10.1109/tip.2019.2960589 article EN IEEE Transactions on Image Processing 2019-12-31

Group activity recognition is a crucial yet challenging problem, whose core lies in fully exploring spatial-temporal interactions among individuals and generating reasonable group representations. However, previous methods either model spatial temporal information separately, or directly aggregate individual features to form features. To address these issues, we propose novel network termed GroupFormer. It captures contextual jointly augment the representations effectively with clustered...

10.1109/iccv48922.2021.01341 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Person re-identification aims to robustly measure similarities between person images. The significant variation of poses and viewing angles challenges for accurate re-identification. spatial layout correspondences query images are vital information tackling this problem but ignored by most state-of-the-art methods. In paper, we propose a novel Kronecker Product Matching module match feature maps different persons in an end-to-end trainable deep neural network. A soft warping scheme is...

10.1109/cvpr.2018.00720 preprint EN 2018-06-01

Person re-identification aims at finding a person of interest in an image gallery by comparing the probe this with all images. It is generally treated as retrieval problem, where affinities between and images (P2G affinities) are used to rank retrieved However, most existing methods only consider P2G but ignore (G2G affinity). Some frameworks incorporated G2G into testing process, which not end-to-end trainable for deep neural networks. In paper, we propose novel group-shuffling random walk...

10.1109/cvpr.2018.00241 preprint EN 2018-06-01

Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet lower accuracy. To design an efficient effective model, we propose the guided training of (GTC), where model learns better alignment feature representations from more powerful attentional guidance. With benefit training, achieves robust accurate prediction for both...

10.1609/aaai.v34i07.6735 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Neural Architecture Search (NAS) achieves significant progress in many computer vision tasks. While methods are proposed to improve the efficiency of NAS, search is still laborious because training and evaluating plausible architectures over large space time-consuming. Assessing network candidates under a proxy (i.e., computationally reduced setting) thus becomes inevitable. In this paper, we observe that most existing proxies exhibit different behaviors maintaining rank consistency among...

10.1109/cvpr42600.2020.01141 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Most 3D shape completion approaches rely heavily on partial-complete pairs and learn in a fully super-vised manner. Despite their impressive performances in-domain data, when generalizing to partial shapes other forms or real-world scans, they often obtain unsatisfactory results due domain gaps. In contrast previous supervised approaches, this paper we present ShapeInversion, which introduces Generative Adversarial Network (GAN) inversion for the first time. ShapeInversion uses GAN...

10.1109/cvpr46437.2021.00181 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Federated learning is a privacy-preserving machine technique that learns shared model across decentralized clients. It can alleviate privacy concerns of personal re-identification, an important computer vision task. In this work, we implement federated to person re-identification (FedReID) and optimize its performance affected by statistical heterogeneity in the real-world scenario. We first construct new benchmark investigate FedReID. This consists (1) nine datasets with different volumes...

10.1145/3394171.3413814 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Deep learning-based 3D object detection has achieved unprecedented success with the advent of large-scale autonomous driving datasets. However, drastic performance degradation remains a critical challenge for cross-domain deployment. In addition, existing domain adaptive methods often assume prior access to target annotations, which is rarely feasible in real world. To address this challenge, we study more realistic setting, unsupervised detection, only utilizes source annotations. 1) We...

10.1109/iccv48922.2021.00874 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01
Coming Soon ...