- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- 3D Shape Modeling and Analysis
- Robotics and Sensor-Based Localization
- Video Surveillance and Tracking Methods
- Autonomous Vehicle Technology and Safety
- Remote Sensing and LiDAR Applications
- 3D Surveying and Cultural Heritage
- Human Motion and Animation
- Computer Graphics and Visualization Techniques
- Hand Gesture Recognition Systems
- Gait Recognition and Analysis
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Medical Imaging and Analysis
- Dental Radiography and Imaging
- Anomaly Detection Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Robot Manipulation and Learning
- Robotic Path Planning Algorithms
- Image Enhancement Techniques
- Video Analysis and Summarization
- Generative Adversarial Networks and Image Synthesis
- Image and Object Detection Techniques
ShanghaiTech University
2021-2025
Shandong Agricultural University
2024
Sun Yat-sen University
2023-2024
Intelligent Health (United Kingdom)
2021-2023
Shandong University
2022-2023
Hebei University of Technology
2022
Xiamen University
2022
Hong Kong Baptist University
2020-2021
Chinese University of Hong Kong
2021
University of Hong Kong
2017-2019
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via convolution. Although this corporation shows competitiveness in cloud, it inevitably alters abandons 3D topology geometric relations. A natural remedy is utilize voxelization convolution network. However, we found that outdoor improvement obtained way quite limited. An important reason property of namely sparsity varying density. Motivated by...
To safely and efficiently navigate in complex urban traffic, autonomous vehicles must make responsible predictions relation to surrounding traffic-agents (vehicles, bicycles, pedestrians, etc.). A challenging critical task is explore the movement patterns of different predict their future trajectories accurately help vehicle reasonable navigation decision. solve this problem, we propose a long short-term memory-based (LSTM-based) realtime traffic prediction algorithm, TrafficPredict. Our...
Simulation systems have become an essential component in the development and validation of autonomous driving technologies. The prevailing state-of-the-art approach for simulation is to use game engines or high-fidelity computer graphics (CG) models create scenarios. However, creating CG vehicle movements (e.g., assets simulation) remains a manual task that can be costly time-consuming. In addition, fidelity images still lacks richness authenticity real-world using these training leads...
This article addresses the problem of distilling knowledge from a large teacher model to slim student network for LiDAR semantic segmentation. Directly employing previous distillation approaches yields inferior results due intrinsic challenges point cloud, i.e., sparsity, randomness and varying density. To tackle aforementioned problems, we propose Point-to-Voxel Knowledge Distillation (PVD), which transfers hidden both level voxel level. Specifically, first leverage pointwise voxelwise...
Contrastive Language-Image Pre-training (CLIP) achieves promising results in 2D zero-shot and few-shot learning. Despite the impressive performance 2D, applying CLIP to help learning 3D scene understanding has yet be explored. In this paper, we make first attempt investigate how knowledge benefits understanding. We propose CLIP2Scene, a simple effective framework that transfers from image-text pre-trained models point cloud network. show network yields on various downstream tasks, i.e.,...
LiDAR segmentation is crucial for autonomous driving perception. Recent trends favor point- or voxel-based methods as they often yield better performance than the traditional range view representation. In this work, we unveil several key factors in building powerful models. We observe that "many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from projections. present RangeFormer – a full-cycle framework comprising novel...
Predicting the future motion of road participants is crucial for autonomous driving but extremely challenging due to staggering uncertainty. Recently, most forecasting methods resort goal-based strategy, i.e., predicting endpoints trajectories as conditions regress entire trajectories, so that search space solution can be reduced. However, accurate goal coordinates are hard predict and evaluate. In addition, point representation destination limits utilization a rich context, leading...
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in 2D space. The projection includes spherical projection, bird-eye view etc. Although this makes cloud suitable CNN-based networks, it inevitably alters abandons 3D topology geometric relations. A straightforward solution to tackle issue of 3D-to-2D is keep representation points In work, we first perform an in-depth analysis different representations backbones...
State-of-the-art methods for driving-scene LiDAR-based perception (including point cloud semantic segmentation, panoptic segmentation and 3D detection, etc.) often project the clouds to 2D space then process them via convolution. Although this cooperation shows competitiveness in cloud, it inevitably alters abandons topology geometric relations. A natural remedy is utilize voxelization convolution network. However, we found that outdoor improvement obtained way quite limited. An important...
HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to the deployed expensive sensors and time-consuming computation. Camera-based usually need separately perform road segmentation view transformation, which often causes distortion absence of content. To push limits technology, we present a novel framework that enables reconstructing local formed by layout vehicle occupancy in bird's-eye given front-view monocular image only. In particular, propose...
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant interest from both industry and academia due to its inherent advantages, such as providing an intuitive representation of the world being conducive data fusion. The rapid advancements in deep learning have led proposal numerous methods for addressing BEV challenges. However, there been no survey encompassing this novel burgeoning research field. To catalyze future research, paper presents a...
Convolutional neural networks, in which each layer receives features from the previous layer(s) and then aggregates/abstracts higher level them, are widely adopted for image classification. To avoid information loss during feature aggregation/abstraction fully utilize lower features, we propose a novel decision fusion module (DFM) making an intermediate based on current fuse its results with original before passing them to next layers. This is devised determine auxiliary category...
Training deep models for semantic scene completion (SSC) is challenging due to the sparse and incomplete input, a large quantity of objects diverse scales as well inherent label noise moving objects. To address above-mentioned problems, we propose following three solutions: 1) Redesigning sub-network. We design novel sub-network, which consists several Multi-Path Blocks (MPBs) aggregate multi-scale features free from lossy downsampling operations. 2) Distilling rich knowledge multi-frame...
We propose a multi-sensor fusion method for capturing challenging 3D human motions with accurate consecutive local poses and global trajectories in large-scale scenarios, only using single LiDAR 4 IMUs, which are set up conveniently worn lightly. Specifically, to fully utilize the geometry information captured by dynamic we design two-stage pose estimator coarse-to-fine manner, where point clouds provide coarse body shape IMU measurements optimize actions. Furthermore, considering...
We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction wild. Employing head-mounted device integrated LiDAR and camera, we record 12 subjects' activities over 10 diverse scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D parameters, translations are provided, together reconstructed scene point clouds. To obtain accurate ground truth such...
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant interest from both industry and academia due to its inherent advantages, such as providing an intuitive representation of the world being conducive data fusion. The rapid advancements in deep learning have led proposal numerous methods for addressing BEV challenges. However, there been no survey encompassing this novel burgeoning research field. To catalyze future research, paper presents a...
To safely and efficiently navigate in complex urban traffic, autonomous vehicles must make responsible predictions relation to surrounding traffic-agents (vehicles, bicycles, pedestrians, etc.). A challenging critical task is explore the movement patterns of different predict their future trajectories accurately help vehicle reasonable navigation decision. solve this problem, we propose a long short-term memory-based (LSTM-based) realtime traffic prediction algorithm, TrafficPredict. Our...
Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications. We propose LiDARHuman26M, a new human dataset captured by LiDAR at much longer range to overcome this limitation. Our also includes ground truth motions acquired IMU system synchronous RGB images. further present strong base-line method, LiDARCap, for point cloud capture. Specifically, we first utilize <tex xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Emerging Metaverse applications demand reliable, accurate, and photorealistic reproductions of human hands to perform sophisticated operations as if in the physical world. While real hand represents one most intricate coordination between bones, muscle, tendon, skin, state-of-the-art techniques unanimously focus on modeling only skeleton hand. In this paper, we present NIMBLE, a novel parametric model that includes missing key components, bringing 3D new level realism. We first annotate...
Depth estimation is usually ill-posed and ambiguous for monocular camera-based 3D multi-person pose estimation. Since LiDAR can capture accurate depth information in long-range scenes, it benefit both the global localization of individuals by providing rich geometry features. Motivated this, we propose a camera single LiDAR-based method large-scale which easy to deploy insensitive light. Specifically, design an effective fusion strategy take advantage multi-modal input data, including images...