- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Anomaly Detection Techniques and Applications
- Human Pose and Action Recognition
- Robotics and Sensor-Based Localization
- 3D Surveying and Cultural Heritage
- Video Surveillance and Tracking Methods
- Remote Sensing and LiDAR Applications
- Adversarial Robustness in Machine Learning
- COVID-19 diagnosis using AI
- Gait Recognition and Analysis
- Robot Manipulation and Learning
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- 3D Shape Modeling and Analysis
- Human Motion and Animation
- Educational Technology and Assessment
- Public Policy and Administration Research
- Artificial Intelligence in Games
- Public-Private Partnership Projects
- Education and Technology Integration
- Public Procurement and Policy
- Neuroscience, Education and Cognitive Function
- Advanced Measurement and Metrology Techniques
University of Hong Kong
2021-2025
Hong Kong University of Science and Technology
2021-2025
Nanyang Technological University
2024
Alibaba Group (China)
2023
Alibaba Group (United States)
2023
Chongqing University
2022-2023
Chinese University of Hong Kong, Shenzhen
2021
Zhejiang University
2020
Classical close-set semantic segmentation networks have limited ability to detect out-of-distribution (OOD) objects, which is important for safety-critical applications such as autonomous driving. Incrementally learning these OOD objects with few annotations an ideal way enlarge the knowledge base of deep models. In this paper, we propose open world system that includes two modules: (1) open-set module both in-distribution and objects. (2) incremental few-shot gradually incorporate those...
Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: are directly distilled into clouds, obviating the need annotations either 2D or 3D during pretraining. ii) Consistency: Spatial temporal relationships enforced at both...
Recent advancements in vision foundation models (VFMs) have revolutionized visual perception 2D, yet their potential for 3D scene understanding, particularly autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed large-scale pretraining across diverse real-world datasets. Our leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds generate high-quality...
This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges the field. Robotic manipulation faces critical bottlenecks, including significant insufficient data and inefficient acquisition, long-horizon complex task planning, multi-modality reasoning ability for robust policy performance across diverse environments. To tackle these challenges, this introduces several model paradigms, Generative Adversarial...
2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several fusion methods have been explored semantic segmentation task, but they suffer from different problems. 2D-to-3D require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D cannot explicitly make full use information. Therefore, we propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation...
3D object detection has been wildly studied in recent years, especially for robot perception systems. However, existing is under a closed-set condition, meaning that the network can only output boxes of trained classes. Unfortunately, this condition not robust enough practical use, as it will identify unknown objects known by mistake. Therefore, paper, we propose an open-set detector, which aims to (1) objects, like detection, and (2) give their accurate bounding boxes. Specifically, divide...
In this letter, we proposea novel vision-based grasp system for closed-loop 6-degrees of freedom grasping unknown objects in cluttered environments. The key factor our is that make the most a geometry-aware scene representation based on truncated signed distance function (TSDF) volume, which can handle multi-view observations from vision sensor, provide comprehensive spatial information pose detector, and allow collision checking to achieve collision-free pose. To eliminate large...
Recent incremental learning for action recognition usually stores representative videos to mitigate catastrophic forgetting. However, only a few bulky can be stored due the limited memory. To address this problem, we propose FrameMaker, memory-efficient video class-incremental approach that learns produce condensed frame each selected video. Specifically, FrameMaker is mainly composed of two crucial components: Frame Condensing and Instance-Specific Prompt. The former reduce memory cost by...
Open-set action recognition is to reject unknown human cases which are out of the distribution training set. Existing methods mainly focus on learning better uncertainty scores but dismiss importance feature representations. We find that features with richer semantic diversity can significantly improve open-set performance under same scores. In this paper, we begin analyzing representation behavior in (OSAR) problem based information bottleneck (IB) theory, and propose enlarge...
Although expectations have emerged as a prominent research theme in public administration, little is known about whether and how affect citizen satisfaction. We investigated the anchoring effects of shaping satisfaction, utilizing survey experiment involving 735 Chinese citizens. Specifically, we examined (1) influence citizens' normative what services should be on their satisfaction levels; (2) that result from numerical anchors satisfaction; (3) effectiveness debiasing education mitigating...
Sustainability is an imperative goal for developing public–private partnership (PPP) projects, which are largely affected by policies. However, few studies have explored the impact of policy implementation on sustainable development PPP projects. This study examines impacts sustainability conducting a comprehensive analysis that grounded in proposed stakeholder-based framework process. Critical factors identified through exploratory factor (EFA) 275 survey responses from practitioners China....
Scene recognition is a fundamental task in robotic perception. For human beings, scene reasonable because they have abundant object knowledge of the real world. The idea transferring prior from humans to significant but still less exploited. In this paper, we propose utilize meaningful representations for indoor representation. First, an improved model (IOM) as baseline that enriches by introducing parsing algorithm pretrained on ADE20K dataset with rich categories related scene. To analyze...
Large Language Models (LLMs) and Multi-modality (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking vision spatial imagination ability. In contrast, humans utilize both left right hemispheres brain for visual during thinking process. Therefore, we introduce novel vision-language framework in this work to perform concurrent tasks with inputs any form. Our incorporates capture...
Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of models an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art across 10 diverse datasets, uncovering insightful phenomena that cope with both aleatoric epistemic uncertainties in understanding. discover despite achieving impressive levels...
Open-set Recognition (OSR) aims to identify test samples whose classes are not seen during the training process. Recently, Unified (UOSR) has been proposed reject only unknown but also known wrongly classified samples, which tends be more practical in real-world applications. The UOSR draws little attention since it is proposed, we find sometimes even than OSR real world applications, as evaluation results of wrong like samples. In this paper, deeply analyze task under different and settings...
The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic prediction methods directly predict the entire point cloud. While most existing rely on fully-supervised learning, manual labeling cloud data is laborious and time-consuming. Therefore, several annotation-efficient have been proposed to address this challenge. Although effective, these weak annotations or additional multi-modal like images, potential...
We are living in a three-dimensional space while moving forward through fourth dimension: time. To allow artificial intelligence to develop comprehensive understanding of such 4D environment, we introduce Panoptic Scene Graph (PSG-4D), new representation that bridges the raw visual data perceived dynamic world and high-level understanding. Specifically, PSG-4D abstracts rich sensory into nodes, which represent entities with precise location status information, edges, capture temporal...
Although the current different types of SAM adaptation methods have achieved promising performance for various downstream tasks, such as prompt-based ones and adapter-based ones, most them belong to one-step paradigm. In real-world scenarios, we are generally confronted with dynamic scenario where data comes in a streaming manner. Driven by practical need, this paper, first propose novel Continual (CoSAM) benchmark 8 task domains carefully analyze limitations existing continual segmentation...
In autonomous driving scenarios, edge cases require perception algorithms, like 3D object detection, to incrementally learn new data during a long term. To achieve it, previous methods seek help from knowledge distillation and recursively transfer old models models. However, conflicts exist between the likelihood term regularizer on both knowledge. this paper, we discuss drawback of in task-incremental-learning scenario for detection propose New-Task-Aware Biased Sampling...
Classical close-set semantic segmentation networks have limited ability to detect out-of-distribution (OOD) objects, which is important for safety-critical applications such as autonomous driving. Incrementally learning these OOD objects with few annotations an ideal way enlarge the knowledge base of deep models. In this paper, we propose open world system that includes two modules: (1) open-set module both in-distribution and objects. (2) incremental few-shot gradually incorporate those...