- 3D Shape Modeling and Analysis
- Human Pose and Action Recognition
- 3D Surveying and Cultural Heritage
- Computer Graphics and Visualization Techniques
- Robotics and Sensor-Based Localization
- Human Motion and Animation
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Video Surveillance and Tracking Methods
- Multimedia Communication and Technology
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Computational Fluid Dynamics and Aerodynamics
- Remote Sensing and LiDAR Applications
- Gas Dynamics and Kinetic Theory
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Advanced Algorithms and Applications
- Topology Optimization in Engineering
- Advanced Image Processing Techniques
- Advanced Memory and Neural Computing
- Visual Attention and Saliency Detection
- Domain Adaptation and Few-Shot Learning
- Heat Transfer and Optimization
- Peer-to-Peer Network Technologies
Chinese Academy of Sciences
2005-2025
Chongqing University of Technology
2025
Shanghai Institute of Optics and Fine Mechanics
2025
Shanghai Artificial Intelligence Laboratory
2025
Beijing Academy of Artificial Intelligence
2025
University of Electronic Science and Technology of China
2023-2025
ShangHai JiAi Genetics & IVF Institute
2025
Nanyang Technological University
2019-2024
Purdue University West Lafayette
2021-2024
City University College of Science and Technology
2024
Memristive devices are able to store and process information, which offers several key advantages over the transistor-based architectures. However, most of two-terminal memristive have fixed functions once made cannot be reconfigured for other situations. Here, we propose demonstrate a device "memlogic" (memory logic) as nonvolatile switch logic operations integrated with memory function in single light-gated memristor. Based on light-modulated switching behavior, memlogic cell is achieve...
Real-scanned point clouds are often incomplete due to viewpoint, occlusion, and noise. Existing cloud completion methods tend generate global shape skeletons hence lack fine local details. Furthermore, they mostly learn a deterministic partial-to-complete mapping, but overlook structural relations in man-made objects. To tackle these challenges, this paper proposes variational framework, Variational Relational Completion network (VRC-Net) with two appealing properties: 1) Probabilistic...
3D avatar creation plays a crucial role in the digital age. However, whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to larger audience, we propose AvatarCLIP, zero-shot text-driven framework for generation animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users customize with desired shape texture, drive described motions using solely natural languages. Our key insight take...
Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers. In this work, we present TCTrack <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> https://github.com/vision4robotics/TCTrack, a comprehensive framework to exploit temporal for aerial tracking. The incorporated at two levels: the extraction of features and refinement similarity maps. Specifically, feature extraction, an online...
Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose Generative Diffusion Prior (GDP) effectively model distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative (DDPM) for solving linear inverse, non-linear, or blind problems....
Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers laymen, recent generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging achieve diverse and fine-grained with various text inputs. To address this problem, we propose MotionDiffuse, first diffusion model-based text-driven framework, demonstrates several desired properties...
In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an in current search given template cloud. Motivated by success transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality results coarse-to-fine manner with help transformer operations. PTTR consists three novel designs. 1) Instead random sampling, design Relation-Aware Sampling preserve relevant points templates during subsampling. 2) Furthermore, Relation...
Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers laymen, recent generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging achieve diverse and fine-grained with various text inputs. To address this problem, we propose <bold xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of large-scale real-scanned databases. To facilitate development perception, reconstruction, and generation real world, we propose OmniObject3D, a large vocabulary object dataset with massive high-quality objects. OmniObject3D has several appealing properties: 1) Large Vocabulary: It comprises 6,000 scanned 190 daily categories, sharing common classes popular 2D (e.g., ImageNet LVIS), benefiting pursuit...
Densely annotating LiDAR point clouds is costly, which often restrains the scalability of fully-supervised learning methods. In this work, we study underexplored semi-supervised (SSL) in semantic segmentation. Our core idea to leverage strong spatial cues better exploit unlabeled data. We propose LaserMix mix laser beams from different scans and then encourage model make consistent confident predictions before after mixing. framework has three appealing properties. 1) Generic: agnostic...
Abstract Van der Waals (vdWs) heterostructures enable bandgap engineering of different 2D materials to realize the interlayer transition via type‐II band alignment leading broaden spectrum that is beyond cut‐off wavelength individual materials. Interlayer has a significant effect on optoelectronic performance vdWs heterostructure devices, and strong in heterojunction always demandable for sufficient charge transfer rapid speed response. Herein, state‐of‐the‐art review presented recent...
Simultaneous resistive switching and rectifying effects firstly observed in a MOF single-crystal material.
Most 3D shape completion approaches rely heavily on partial-complete pairs and learn in a fully super-vised manner. Despite their impressive performances in-domain data, when generalizing to partial shapes other forms or real-world scans, they often obtain unsatisfactory results due domain gaps. In contrast previous supervised approaches, this paper we present ShapeInversion, which introduces Generative Adversarial Network (GAN) inversion for the first time. ShapeInversion uses GAN...
Scanned 3D point clouds for real-world scenes often suffer from noise and incompletion. Observing that prior cloud shape completion networks overlook local geometric features, we propose our ECG - an Edge-aware Completion network with Graph convolution, which facilitates fine-grained generation multi-scale edge features. Our consists of two consecutive stages: 1) skeleton 2) details refinement. Each stage is a sub-network conditioned on the input incomplete cloud. The first generates coarse...
3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge text-driven generation, leading to substantial progress in capturing common motions. However, the performance more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based framework that integrates retrieval mechanism refine denoising process. ReMoDiffuse enhances generalizability and diversity of three key designs: 1)...
The robustness of 3D perception systems under natural corruptions from environments and sensors is pivotal for safety-critical applications. Existing large-scale datasets often contain data that are meticulously cleaned. Such configurations, however, cannot reflect the reliability models during deployment stage. In this work, we present Robo3D, first comprehensive benchmark heading toward probing detectors segmentors out-of-distribution scenarios against occur in real-world environments....
Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance ideal conditions while overlooking real-world conditions; 2) adopt tracking-by-detection paradigm, neglecting rich temporal contexts; 3) only integrate information into template, where contexts among consecutive frames are far from being fully utilized. To handle those problems, we propose a two-level framework (TCTrack) that can exploit...
Existing Human NeRF methods for reconstructing 3D humans typically rely on multiple 2D images from multi-view cameras or monocular videos captured fixed camera views. However, in real-world scenarios, human are often random angles, presenting challenges high-quality reconstruction. In this paper, we propose SHERF, the first generalizable model recovering animatable a single input image. SHERF extracts and encodes representations canonical space, enabling rendering animation free views poses....
Near-infrared (NIR) polarization photodetectors with two-dimensional (2D) semiconductors and their van der Waals (vdW) heterostructures have presented great impact for the development of a wide range technologies, such as in optoelectronics communication fields. Nevertheless, lack photogenerated charge carrier at device's interface leads to poor collection efficiency low linear dichroism ratio, hindering achievement high-performance optoelectronic devices multifunctionalities. Herein, we...
Ultramicroporous metal-organic frameworks (MOFs) are demonstrated to be advantageous for the separation and purification of light hydrocarbons such as C
Generating diverse and high-quality 3D assets automatically poses a fundamental yet challenging task in computer vision. Despite extensive efforts generation, existing optimization-based approaches struggle to produce large-scale efficiently. Meanwhile, feed-forward methods often focus on generating only single category or few categories, limiting their generalizability. Therefore, we introduce diffusion-based framework address these challenges with model. To handle the large diversity...