- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- 3D Shape Modeling and Analysis
- Video Surveillance and Tracking Methods
- 3D Surveying and Cultural Heritage
- Robotics and Sensor-Based Localization
- Multimodal Machine Learning Applications
- Advanced Vision and Imaging
- Computer Graphics and Visualization Techniques
- Human Pose and Action Recognition
- Remote Sensing and LiDAR Applications
- Face recognition and analysis
- Advanced Computational Techniques and Applications
- Machine Learning and Data Classification
- Mobile Ad Hoc Networks
- Adversarial Robustness in Machine Learning
- Underwater Vehicles and Communication Systems
- Anomaly Detection Techniques and Applications
- Video Analysis and Summarization
- Evolutionary Psychology and Human Behavior
- Advanced Algorithms and Applications
- Image Enhancement Techniques
- Image Retrieval and Classification Techniques
- Natural Language Processing Techniques
Shanghai Electric (China)
2021-2025
Chinese University of Hong Kong
2015-2024
East China Normal University
2023-2024
SMART Reading
2020-2024
Minzu University of China
2023
Shandong University of Science and Technology
2023
Jilin Jianzhu University
2022
China Construction Bank
2021
Harbin Engineering University
2016-2021
Central South University
2019-2021
The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting flow proposal-based instance segmentation framework. Specifically, enhance the entire feature hierarchy with accurate localization signals lower layers by bottom-up path augmentation, which shortens between and topmost feature. We present adaptive pooling, links grid all levels to make useful each level propagate directly following...
Prevalence of voxel-based 3D single-stage detectors contrast with underexplored point-based methods. In this paper, we present a lightweight single stage object detector 3DSSD to achieve decent balance accuracy and efficiency. paradigm, all upsampling layers the refinement stage, which are indispensable in existing methods, abandoned. We instead propose fusion sampling strategy downsampling process make detection on less representative points feasible. A delicate box prediction network,...
We propose a two-stage 3D object detection framework, named sparse-to-dense Object Detector (STD). The first stage is bottom-up proposal generation network that uses raw point clouds as input to generate accurate proposals by seeding each with new spherical anchor. It achieves higher recall less computation compared prior works. Then, PointsPool applied for feature transforming interior features from sparse expression compact representation, which saves even more computation. In box...
We present a unified, efficient and effective framework for point-cloud based 3D object detection. Our two-stage approach utilizes both voxel representation raw point cloud data to exploit respective advantages. The first stage network, with as input, only consists of light convolutional operations, producing small number high-quality initial predictions. Coordinate indexed feature each in prediction are effectively fused the attention mechanism, preserving accurate localization context...
Instance segmentation is an important task for scene understanding. Compared to the fully-developed 2D, 3D instance point clouds have much room improve. In this paper, we present PointGroup, a new end-to-end bottom-up architecture, specifically focused on better grouping points by exploring void space between objects. We design two-branch network extract features and predict semantic labels offsets, shifting each towards its respective centroid. A clustering component followed utilize both...
3D point cloud segmentation has made tremendous progress in recent years. Most current methods focus on aggregating local features, but fail to directly model long-range dependencies. In this paper, we propose Stratified Transformer that is able capture contexts and demonstrates strong generalization ability high performance. Specifically, first put forward a novel key sampling strategy. For each query point, sample nearby points densely distant sparsely as its keys stratified way, which...
In this paper, we propose Sequential Grouping Networks (SGN) to tackle the problem of object instance segmentation. SGNs employ a sequence neural networks, each solving sub-grouping increasing semantic complexity in order gradually compose objects out pixels. particular, first network aims group pixels along image row and column by predicting horizontal vertical breakpoints. These breakpoints are then used create line segments. By exploiting two-directional information, second groups lines...
The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting flow proposal-based instance segmentation framework. Specifically, enhance the entire feature hierarchy with accurate localization signals lower layers by bottom-up path augmentation, which shortens between and topmost feature. We present adaptive pooling, links grid all levels to make useful each level propagate directly following...
We propose a novel data augmentation method `GridMask' in this paper. It utilizes information removal to achieve state-of-the-art results variety of computer vision tasks. analyze the requirement dropping. Then we show limitation existing dropping algorithms and our structured method, which is simple yet very effective. based on deletion regions input image. Our extensive experiments that outperforms latest AutoAugment, way more computationally expensive due use reinforcement learning find...
We achieve 3D semantic scene labeling by exploring relation between each point and its contextual neighbors through edges. Besides an encoder-decoder branch for predicting labels, we construct edge to hierarchically integrate features generate features. To incorporate in the branch, establish a hierarchical graph framework, where is initialized from coarse layer gradually enriched along decoding process. For final graph, predict label indicate consistency of two connected points enhance...
Deep neural networks may perform poorly when training datasets are heavily class-imbalanced. Recently, two-stage methods decouple representation learning and classifier to improve performance. But there is still the vital issue of miscalibration. To address it, we design two calibration performance in such scenarios. Motivated by fact that predicted probability distributions classes highly related numbers class instances, propose label-aware smoothing deal with different degrees...
In this paper, we propose Parametric Contrastive Learning (PaCo) to tackle long-tailed recognition. Based on theoretical analysis, observe supervised contrastive loss tends bias high-frequency classes and thus increases the difficulty of imbalanced learning. We introduce a set parametric class-wise learnable centers rebalance from an optimization perspective. Further, analyze our PaCo under balanced setting. Our analysis demonstrates that can adaptively enhance intensity pushing samples same...
Most state-of-the-art 3D object detectors rely heavily on LiDAR sensors and there remains a large gap in terms of performance between image-based LiDAR-based methods, caused by inappropriate representation for the prediction scenarios. Our method, called Deep Stereo Geometry Network (DSGN), reduces this significantly detecting objects differentiable volumetric -- geometric volume, which effectively encodes structure regular space. With representation, we learn depth information semantic cues...
Semantic segmentation has made tremendous progress in recent years. However, satisfying performance highly depends on a large number of pixel-level annotations. Therefore, this paper, we focus the semi-supervised problem where only small set labeled data is provided with much larger collection totally unlabeled images. Nevertheless, due to limited annotations, models may overly rely contexts available training data, which causes poor generalization scenes un-seen before. A preferred...
Amodal instance segmentation, a new direction of aims to segment each object involving its invisible, occluded parts imitate human ability. This task requires reason objects' complex structure. Despite important and futuristic, this lacks data with large-scale detailed annotations, due the difficulty correctly consistently labeling invisible parts, which creates huge barrier explore frontier visual recognition. In paper, we augment KITTI more pixel-level annotation for 8 categories, call...
Deep learning algorithms face great challenges with long-tailed data distribution which, however, is quite a common case in real-world scenarios. Previous methods tackle the problem from either aspect of input space (re-sampling classes different frequencies) or loss (re-weighting weights), suffering heavy over-fitting to tail hard optimization during training. To alleviate these issues, we propose more fundamental perspective for recognition, i.e., parameter space, and aims preserve...
Neural Radiance Fields (NeRF) has been wildly applied to various tasks for its high-quality representation of 3D scenes. It takes long per-scene training time and per-image testing time. In this paper, we present EfficientNeRF as an efficient NeRF-based method represent scene synthesize novel-view images. Although several ways exist accelerate the or process, it is still difficult much reduce both phases simultaneously. We analyze density weight distribution sampled points then propose valid...
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations. Previous methods limited the feature and prototype representation suffer from coarse granularity train-set overfitting. In this work, we design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on transformer architecture. The self-attention modules are used assist in establishing hierarchical dense features, as...
We present a novel 3D object detection framework, named IPOD, based on raw point cloud. It seeds proposal for each point, which is the basic element. This paradigm provides us with high recall and fidelity of information, leading to suitable way process cloud data. design an end-to-end trainable architecture, where features all points within are extracted from backbone network achieve feature final bounding inference. These both context information precise coordinates yield improved...
Rapid progress in 3D semantic segmentation is inseparable from the advances of deep network models, which highly rely on large-scale annotated data for training. To address high cost and challenges point-level labeling, we present a method semi-supervised point cloud to adopt unlabeled clouds training boost model performance. Inspired by recent contrastive loss self-supervised tasks, propose guided enhance feature representation generalization ability setting. Semantic predictions serve as...
Aiming at simultaneous detection and segmentation (SD-S), we propose a proposal-free framework, which detect segment object instances via mid-level patches. We design unified trainable network on patches, is followed by fast effective patch aggregation algorithm to infer instances. Our method benefits from end-to-end training. Without proposal generation, computation time can also be reduced. In experiments, our yields results 62.1% 61.8% in terms of mAPr VOC2012 val SDS val, are...
Currently, there have been many kinds of voxel-based 3D single stage detectors, while point-based methods are still underexplored. In this paper, we first present a lightweight and effective object detector, named 3DSSD, achieving good balance between accuracy efficiency. paradigm, all upsampling layers refinement stage, which indispensable in existing methods, abandoned to reduce the large computation cost. We novelly propose fusion sampling strategy downsampling process make detection on...
Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain a or clip first, merge the incomplete results by tracking matching. These may cause error accumulation merging step. Contrarily, we propose new paradigm – Propose-Reduce, generate complete sequences input videos single We further build sequence propagation head on existing image-level network long-term propagation. To ensure robustness...
In this paper, we propose the Generalized Parametric Contrastive Learning (GPaCo/PaCo) which works well on both imbalanced and balanced data. Based theoretical analysis, observe supervised contrastive loss tends to bias high-frequency classes thus increases difficulty of learning. We introduce a set parametric class-wise learnable centers rebalance from an optimization perspective. Further, analyze our GPaCo/PaCo under setting. Our analysis demonstrates that can adaptively enhance intensity...