- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Human Motion and Animation
- 3D Shape Modeling and Analysis
- Computer Graphics and Visualization Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Visual Attention and Saliency Detection
- Face recognition and analysis
- Robotics and Sensor-Based Localization
- Hand Gesture Recognition Systems
- Industrial Vision Systems and Defect Detection
- Statistical Methods and Inference
- Automated Road and Building Extraction
- Statistical Methods and Bayesian Inference
- Cloud Computing and Remote Desktop Technologies
- Banking Systems and Strategies
- Technology and Security Systems
- Geophysical Methods and Applications
- Ear Surgery and Otitis Media
- CCD and CMOS Imaging Sensors
Shanxi Eye Hospital
2025
Shanxi Medical University
2025
Beijing Academy of Quantum Information Sciences
2025
Tibet University
2024
Tongji University
2024
Xi’an University of Posts and Telecommunications
2024
Hainan University
2024
Shanghai Artificial Intelligence Laboratory
2024
Chinese University of Hong Kong
2018-2023
University of Hong Kong
2023
Most existing methods of semantic segmentation still suffer from two aspects challenges: intra-class inconsistency and inter-class indistinction. To tackle these problems, we propose a Discriminative Feature Network (DFN), which contains sub-networks: Smooth Border Network. Specifically, to handle the problem, specially design with Channel Attention Block global average pooling select more discriminative features. Furthermore, make bilateral features boundary distinguishable deep...
Recent works have widely explored the contextual dependencies to achieve more accurate segmentation results. However, most approaches rarely distinguish different types of dependencies, which may pollute scene understanding. In this work, we directly supervise feature aggregation intra-class and interclass context clearly. Specifically, develop a Context Prior with supervision Affinity Loss. Given an input image corresponding ground truth, Loss constructs ideal affinity map learning Prior....
Panoptic segmentation, which needs to assign a category label each pixel and segment object instance simultaneously, is challenging topic. Traditionally, the existing approaches utilize two independent models without sharing features, makes pipeline inefficient implement. In addition, heuristic method usually employed merge results. However, overlapping relationship between instances difficult determine sufficient context information during merging process. To address problems, we propose...
Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise resolution to achieve real-time inference speed, which leads poor performance. In this paper, we address dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design Spatial Path small stride preserve the generate high-resolution features. Meanwhile, Context fast downsampling strategy is employed obtain sufficient On top of two paths,...
Most existing methods of semantic segmentation still suffer from two aspects challenges: intra-class inconsistency and inter-class indistinction. To tackle these problems, we propose a Discriminative Feature Network (DFN), which contains sub-networks: Smooth Border Network. Specifically, to handle the problem, specially design with Channel Attention Block global average pooling select more discriminative features. Furthermore, make bilateral features boundary distinguishable deep...
The ability to synthesize long-term human motion sequences in real-world scenes can facilitate numerous applications. Previous approaches for scene-aware synthesis are constrained by pre-defined target objects or positions and thus limit the diversity of human-scene interactions synthesized motions. In this paper, we focus on problem synthesizing diverse motions under guidance action sequences. To achieve this, first decompose into three aspects, namely interaction (e.g. sitting different...
We revisit human motion synthesis, a task useful in various real-world applications, this paper. Whereas number of methods have been developed previously for task, they are often limited two aspects: 1) focus on the poses while leaving location movement behind, and 2) ignore impact environment motion. In paper, we propose new framework, with interaction between scene taken into account. Considering uncertainty motion, formulate as generative whose objective is to generate plausible...
Neural Radiance Field (NeRF) has emerged as a compelling method to represent 3D objects and scenes for photo-realistic rendering. However, its implicit representation causes difficulty in manipulating the models like explicit mesh representation. Several recent advances NeRF manipulation are usually restricted by shared renderer network, or suffer from large model size. To circumvent hurdle, this paper, we present an neural field that enables efficient convenient of models. achieve goal,...
Realistic human-centric rendering plays a key role in both computer vision and graphics. Rapid progress has been made the algorithm aspect over years, yet existing datasets benchmarks are rather impoverished terms of diversity (e.g., outfit's fabric/material, body's interaction with objects, motion sequences), which crucial for effect. Researchers usually constrained to explore evaluate small set problems on current datasets, while real-world applications require methods be robust across...
Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing geometric counterpart RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with pixels models problem as cross-modal feature fusion obtain better representations achieve more segmentation. This, however, may not lead satisfactory results actual data generally noisy, which might worsen accuracy networks go deeper. In this paper, we...
Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one model that can generalized segmentation. However, most VFMs cannot run realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has purpose, such as semantic on driving scene. We argue diverse outputs are needed for real applications. Thus,...
Convolutional neural networks (CNN) have achieved great success in RGB semantic segmentation. RGB-D images provide additional depth information, which can improve segmentation performance. To take full advantages of the 3D geometry relations provided by images, this paper, we propose 2.5D convolution, mimics one convolution kernel several masked 2D kernels. Our effectively process spatial between pixels a manner similar to while still sampling on plane, and thus saves computational cost. And...
3D interacting hand reconstruction is essential to facilitate human-machine interaction and human behaviors understanding. Previous works in this field either rely on auxiliary inputs such as depth images or they can only handle a single if monocular RGB are used. Single-hand methods tend generate collided meshes, when applied closely hands, since cannot model the interactions between two hands explicitly. In paper, we make first attempt reconstruct from images. Our method meshes with both...
This paper investigates the potential of enhancing Neural Radiance Fields (NeRF) with semantics to expand their applications. Although NeRF has been proven useful in real-world applications like VR and digital creation, lack hinders interaction objects complex scenes. We propose imitate backbone feature off-the-shelf perception models achieve zero-shot semantic segmentation NeRF. Our framework reformulates process by directly rendering features only applying decoder from models. eliminates...
Underwater autonomous path planning is a critical component of intelligent underwater vehicle system design, especially for maritime conservation and monitoring missions. Effective these robots necessitates considering various constraints related to robot kinematics, optimization objectives, other pertinent factors. Sample-based strategies have successfully tackled this problem, particularly the rapidly exploring random tree star (RRT*) algorithm. However, conventional path-searching...