- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Handwritten Text Recognition Techniques
- Generative Adversarial Networks and Image Synthesis
- Vehicle License Plate Recognition
- Image Processing and 3D Reconstruction
- Reinforcement Learning in Robotics
- Robot Manipulation and Learning
- Advanced Image and Video Retrieval Techniques
- Robotic Path Planning Algorithms
- Sparse and Compressive Sensing Techniques
- Adversarial Robustness in Machine Learning
- Human Motion and Animation
- Natural Language Processing Techniques
- Speech Recognition and Synthesis
- AI-based Problem Solving and Planning
- Robotics and Automated Systems
- Power Line Inspection Robots
- Human Pose and Action Recognition
- Advanced Steganography and Watermarking Techniques
- Autonomous Vehicle Technology and Safety
- 3D Shape Modeling and Analysis
- Topic Modeling
- Digital Media Forensic Detection
Changsha University of Science and Technology
2019-2024
Shanghai University
2021-2022
UC San Diego Health System
2021
University of California, San Diego
2019-2020
Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, assets in existing benchmarks mostly lack the diversity of shapes that align with real-world intra-class complexity topology geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark skills over diverse objects a full-physics simulator. ManiSkill include large topological geometric variations. Tasks are carefully chosen cover distinct...
Synthetic Aperture Radar (SAR) scene classification is challenging but widely applied, in which deep learning can play a pivotal role because of its hierarchical feature ability. In the paper, we propose new framework, named Feature Recalibration Network with Multi-scale Spatial Features (FRN-MSF), to achieve high accuracy SAR-based classification. First, Multi-Scale Omnidirectional Gaussian Derivative Filter (MSOGDF) constructed. Then, (MSF) SAR scenes are generated by weighting MSOGDF,...
Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations. Unaware inherently unmatched semantics distributions between source and target domains, existing distribution matching methods (i.e., GAN-based) can give undesired solutions. In particular, although producing visually reasonable outputs, learned models usually flip inputs. To tackle this without using extra supervisions, we propose enforce translated outputs...
We tackle the convolution neural networks (CNNs) backdoor detection problem by proposing a new representation called one-pixel signature. Our task is to detect/classify if CNN model has been maliciously inserted with an unknown Trojan trigger or not. Here, each associated signature that created generating, pixel-by-pixel, adversarial value result of largest change class prediction. The agnostic design choice architectures, and how they were trained. It can be computed efficiently for...
Recent advances in deep learning theory have evoked the study of generalizability across different local minima neural networks (DNNs). While current work focused on either discovering properties good or developing regularization techniques to induce minima, no approach exists that can tackle both problems. We achieve these two goals successfully a unified manner. Specifically, based observed Fisher information we propose metric strongly indicative and effectively applied as practical...
Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need extra module (context modeling module) to help CNN capture global dependencies solve inductive bias strengthen relationship features. Recently, transformer has been proposed as a promising network for context by self-attention mechanism, but one main...
We study how to learn a policy with compositional generalizability. propose two-stage framework, which refactorizes high-reward teacher into generalizable student strong inductive bias. Particularly, we implement an object-centric GNN-based policy, whose input objects are learned from images through self-supervised learning. Empirically, evaluate our approach on four difficult tasks that require generalizability, and achieve superior performance compared baselines.
Structured scenes are characterized by complex road conditions and poor GPS signals, map degradation of positioning accuracy often occur when robots build maps structured scenes. Aiming at the above problems, a low-cost multi-sensor-periodize fusion SLAM system is designed, which fuses four sensors, namely, 2D LIDAR, RBGD camera, inertial measurement unit, wheel odometer. A sub-echelon data processing session designed in motion initialization session, sensor multi-strategy selection method...
The live detection system for tensioning clamps based on unmanned aerial vehicles is the development direction of routine inspections high-voltage transmission lines. in real-time complex environments fundament this system. Addressing problem, YOLOv8-SC proposed YOLOv8. Replacing original C2f module with new C2fG-Ghost module, and a GAM attention layer added to backbone network. A binocular 3D coordinate algorithm obtain relative position target. Experiments show that improved improves by...
Recent research has shown that fine-tuning diffusion models (DMs) with arbitrary rewards, including non-differentiable ones, is feasible reinforcement learning (RL) techniques, enabling flexible model alignment. However, applying existing RL methods to timestep-distilled DMs challenging for ultra-fast ($\le2$-step) image generation. Our analysis suggests several limitations of policy-based such as PPO or DPO toward this goal. Based on the insights, we propose learned differentiable surrogate...
We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control. To this end, we develop transformer (TFT), under controllable parameters, that are able to account for hidden layer while maintaining overall consistency layers. The learned generators capture underlying processes independent particular training images. Our proposed TFT framework brings insights and helps understanding of, an important problem studying CNN internal...
Recently, video scene text detection has received increasing attention due to its comprehensive applications. However, the lack of annotated datasets become one most important problems, which hinders development detection. The existing are not large-scale expensive cost caused by manual labeling. In addition, instances in these too clear be a challenge. To address above issues, we propose tracking based semi-automatic labeling strategy for videos this paper. We get annotation manually first...
Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators Embodied AI (EAI) challenges only provide simulated indoor with limited layouts. This paper presents Luminous, the first research framework employs state-of-the-art scene synthesis algorithms to generate large-scale challenges. Further, we automatically quantitatively evaluate quality...
Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations. Unaware inherently unmatched semantics distributions between source and target domains, existing distribution matching methods (i.e., GAN-based) can give undesired solutions. In particular, although producing visually reasonable outputs, learned models usually flip inputs. To tackle this without using extra supervision, we propose enforce translated outputs...
Abstract Aiming at the problems of convergence difficulties faced by deep reinforcement learning algorithms in dynamic pedestrian environments, and insufficient reward feedback mechanisms, a data-driven model-driven navigation algorithm which named GRRL has been proposed. In order to enrich perfect mechanism, we designed function. The function fully considers relationship between robot target position. It mainly includes three parts. experimental results show that autonomous efficiency...
The quality of the input text image has a clear impact on output scene recognition (STR) system; however, due to fact that main content is sequence characters containing semantic information, how effectively assess remains research challenge. Text assessment (TIQA) can help in picking hard sample, leading more robust STR system and recognition-oriented restoration. In this paper, by arguing comes from character-level texture feature embedding robustness, we propose learning-based...
Beyond the underlaying unrealistic presumptions in existing video deblurring datasets and algorithms which presume that a naturally blurred is fully blurred. In this work, we define more realistic frames averaging-based data degradation model by referring to as partially sequence, use it build REBVIDS, novel dataset close gap between synthetically training data, address most shortcomings of datasets. We also present DeblurNet, two phases training-based deep learning for deblurring, consists...