Zhiwei Jia

ORCID: 0000-0001-5391-5931
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Handwritten Text Recognition Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Vehicle License Plate Recognition
  • Image Processing and 3D Reconstruction
  • Reinforcement Learning in Robotics
  • Robot Manipulation and Learning
  • Advanced Image and Video Retrieval Techniques
  • Robotic Path Planning Algorithms
  • Sparse and Compressive Sensing Techniques
  • Adversarial Robustness in Machine Learning
  • Human Motion and Animation
  • Natural Language Processing Techniques
  • Speech Recognition and Synthesis
  • AI-based Problem Solving and Planning
  • Robotics and Automated Systems
  • Power Line Inspection Robots
  • Human Pose and Action Recognition
  • Advanced Steganography and Watermarking Techniques
  • Autonomous Vehicle Technology and Safety
  • 3D Shape Modeling and Analysis
  • Topic Modeling
  • Digital Media Forensic Detection

Changsha University of Science and Technology
2019-2024

Shanghai University
2021-2022

UC San Diego Health System
2021

University of California, San Diego
2019-2020

Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, assets in existing benchmarks mostly lack the diversity of shapes that align with real-world intra-class complexity topology geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark skills over diverse objects a full-physics simulator. ManiSkill include large topological geometric variations. Tasks are carefully chosen cover distinct...

10.48550/arxiv.2107.14483 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Synthetic Aperture Radar (SAR) scene classification is challenging but widely applied, in which deep learning can play a pivotal role because of its hierarchical feature ability. In the paper, we propose new framework, named Feature Recalibration Network with Multi-scale Spatial Features (FRN-MSF), to achieve high accuracy SAR-based classification. First, Multi-Scale Omnidirectional Gaussian Derivative Filter (MSOGDF) constructed. Then, (MSF) SAR scenes are generated by weighting MSOGDF,...

10.3390/s19112479 article EN cc-by Sensors 2019-05-30

Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations. Unaware inherently unmatched semantics distributions between source and target domains, existing distribution matching methods (i.e., GAN-based) can give undesired solutions. In particular, although producing visually reasonable outputs, learned models usually flip inputs. To tackle this without using extra supervisions, we propose enforce translated outputs...

10.1109/iccv48922.2021.01401 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

We tackle the convolution neural networks (CNNs) backdoor detection problem by proposing a new representation called one-pixel signature. Our task is to detect/classify if CNN model has been maliciously inserted with an unknown Trojan trigger or not. Here, each associated signature that created generating, pixel-by-pixel, adversarial value result of largest change class prediction. The agnostic design choice architectures, and how they were trained. It can be computed efficiently for...

10.48550/arxiv.2008.07711 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Recent advances in deep learning theory have evoked the study of generalizability across different local minima neural networks (DNNs). While current work focused on either discovering properties good or developing regularization techniques to induce minima, no approach exists that can tackle both problems. We achieve these two goals successfully a unified manner. Specifically, based observed Fisher information we propose metric strongly indicative and effectively applied as practical...

10.48550/arxiv.1911.08192 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need extra module (context modeling module) to help CNN capture global dependencies solve inductive bias strengthen relationship features. Recently, transformer has been proposed as a promising network for context by self-attention mechanism, but one main...

10.3390/electronics10222780 article EN Electronics 2021-11-13

We study how to learn a policy with compositional generalizability. propose two-stage framework, which refactorizes high-reward teacher into generalizable student strong inductive bias. Particularly, we implement an object-centric GNN-based policy, whose input objects are learned from images through self-supervised learning. Empirically, evaluate our approach on four difficult tasks that require generalizability, and achieve superior performance compared baselines.

10.48550/arxiv.2011.00971 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Structured scenes are characterized by complex road conditions and poor GPS signals, map degradation of positioning accuracy often occur when robots build maps structured scenes. Aiming at the above problems, a low-cost multi-sensor-periodize fusion SLAM system is designed, which fuses four sensors, namely, 2D LIDAR, RBGD camera, inertial measurement unit, wheel odometer. A sub-echelon data processing session designed in motion initialization session, sensor multi-strategy selection method...

10.1109/crc60659.2023.10488573 article EN 2024-04-09

The live detection system for tensioning clamps based on unmanned aerial vehicles is the development direction of routine inspections high-voltage transmission lines. in real-time complex environments fundament this system. Addressing problem, YOLOv8-SC proposed YOLOv8. Replacing original C2f module with new C2fG-Ghost module, and a GAM attention layer added to backbone network. A binocular 3D coordinate algorithm obtain relative position target. Experiments show that improved improves by...

10.1109/crc60659.2023.10488490 article EN 2024-04-09

Recent research has shown that fine-tuning diffusion models (DMs) with arbitrary rewards, including non-differentiable ones, is feasible reinforcement learning (RL) techniques, enabling flexible model alignment. However, applying existing RL methods to timestep-distilled DMs challenging for ultra-fast ($\le2$-step) image generation. Our analysis suggests several limitations of policy-based such as PPO or DPO toward this goal. Based on the insights, we propose learned differentiable surrogate...

10.48550/arxiv.2411.15247 preprint EN arXiv (Cornell University) 2024-11-22

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control. To this end, we develop transformer (TFT), under controllable parameters, that are able to account for hidden layer while maintaining overall consistency layers. The learned generators capture underlying processes independent particular training images. Our proposed TFT framework brings insights and helps understanding of, an important problem studying CNN internal...

10.48550/arxiv.1712.02400 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Recently, video scene text detection has received increasing attention due to its comprehensive applications. However, the lack of annotated datasets become one most important problems, which hinders development detection. The existing are not large-scale expensive cost caused by manual labeling. In addition, instances in these too clear be a challenge. To address above issues, we propose tracking based semi-automatic labeling strategy for videos this paper. We get annotation manually first...

10.1109/access.2021.3066601 article EN cc-by-nc-nd IEEE Access 2021-01-01

Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators Embodied AI (EAI) challenges only provide simulated indoor with limited layouts. This paper presents Luminous, the first research framework employs state-of-the-art scene synthesis algorithms to generate large-scale challenges. Further, we automatically quantitatively evaluate quality...

10.48550/arxiv.2111.05527 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations. Unaware inherently unmatched semantics distributions between source and target domains, existing distribution matching methods (i.e., GAN-based) can give undesired solutions. In particular, although producing visually reasonable outputs, learned models usually flip inputs. To tackle this without using extra supervision, we propose enforce translated outputs...

10.48550/arxiv.2012.04932 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Abstract Aiming at the problems of convergence difficulties faced by deep reinforcement learning algorithms in dynamic pedestrian environments, and insufficient reward feedback mechanisms, a data-driven model-driven navigation algorithm which named GRRL has been proposed. In order to enrich perfect mechanism, we designed function. The function fully considers relationship between robot target position. It mainly includes three parts. experimental results show that autonomous efficiency...

10.1088/1742-6596/2171/1/012024 article EN Journal of Physics Conference Series 2022-01-01

The quality of the input text image has a clear impact on output scene recognition (STR) system; however, due to fact that main content is sequence characters containing semantic information, how effectively assess remains research challenge. Text assessment (TIQA) can help in picking hard sample, leading more robust STR system and recognition-oriented restoration. In this paper, by arguing comes from character-level texture feature embedding robustness, we propose learning-based...

10.3390/electronics11101611 article EN Electronics 2022-05-18

Beyond the underlaying unrealistic presumptions in existing video deblurring datasets and algorithms which presume that a naturally blurred is fully blurred. In this work, we define more realistic frames averaging-based data degradation model by referring to as partially sequence, use it build REBVIDS, novel dataset close gap between synthetically training data, address most shortcomings of datasets. We also present DeblurNet, two phases training-based deep learning for deblurring, consists...

10.1109/access.2021.3074199 article EN cc-by IEEE Access 2021-01-01
Coming Soon ...