NFDI4DS | UHH-SEMS - Publication Details

Zhiwei Jia

ORCID: 0000-0001-5391-5931

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020431899

Research Areas

Domain Adaptation and Few-Shot Learning
Advanced Neural Network Applications
Multimodal Machine Learning Applications
Handwritten Text Recognition Techniques
Generative Adversarial Networks and Image Synthesis
Vehicle License Plate Recognition
Image Processing and 3D Reconstruction
Reinforcement Learning in Robotics
Robot Manipulation and Learning
Advanced Image and Video Retrieval Techniques
Robotic Path Planning Algorithms
Sparse and Compressive Sensing Techniques
Adversarial Robustness in Machine Learning
Human Motion and Animation
Natural Language Processing Techniques
Speech Recognition and Synthesis
AI-based Problem Solving and Planning
Robotics and Automated Systems
Power Line Inspection Robots
Human Pose and Action Recognition
Advanced Steganography and Watermarking Techniques
Autonomous Vehicle Technology and Safety
3D Shape Modeling and Analysis
Topic Modeling
Digital Media Forensic Detection

Changsha University of Science and Technology
2019-2024

Shanghai University
2021-2022

UC San Diego Health System
2021

University of California, San Diego
2019-2020

Conditional Generative Adversarial Network Based on Self-Attention Mechanism and VAE Algorithm and Its Applications

OPENALEX - Publications

Jianing Yang Yanming Zhao Zhiwei Jia

10.18280/ts.420136 article EN cc-by Traitement du signal 2025-02-28

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

OPENALEX - Publications

Tongzhou Mu Zhan Ling Fanbo Xiang Derek Yang Xuanlin Li and 4 more

Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, assets in existing benchmarks mostly lack the diversity of shapes that align with real-world intra-class complexity topology geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark skills over diverse objects a full-physics simulator. ManiSkill include large topological geometric variations. Tasks are carefully chosen cover distinct...

10.48550/arxiv.2107.14483 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A New Deep Learning Algorithm for SAR Scene Classification Based on Spatial Statistical Modeling and Features Re-Calibration

OPENALEX - Publications

Lifu Chen Xianliang Cui Zhenhong Li Zhihui Yuan Xing Jin and 2 more

Synthetic Aperture Radar (SAR) scene classification is challenging but widely applied, in which deep learning can play a pivotal role because of its hierarchical feature ability. In the paper, we propose new framework, named Feature Recalibration Network with Multi-scale Spatial Features (FRN-MSF), to achieve high accuracy SAR-based classification. First, Multi-Scale Omnidirectional Gaussian Derivative Filter (MSOGDF) constructed. Then, (MSF) SAR scenes are generated by weighting MSOGDF,...

10.3390/s19112479 article EN cc-by Sensors 2019-05-30

Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics

OPENALEX - Publications

Zhiwei Jia Bodi Yuan Kangkang Wang Hong Wu David Clifford and 2 more

Many applications of unpaired image-to-image translation require the input contents to be preserved semantically during translations. Unaware inherently unmatched semantics distributions between source and target domains, existing distribution matching methods (i.e., GAN-based) can give undesired solutions. In particular, although producing visually reasonable outputs, learned models usually flip inputs. To tackle this without using extra supervisions, we propose enforce translated outputs...

10.1109/iccv48922.2021.01401 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

One-pixel Signature: Characterizing CNN Models for Backdoor Detection

OPENALEX - Publications

Shanjiaoyang Huang Wei‐Qi Peng Zhiwei Jia Zhuowen Tu

We tackle the convolution neural networks (CNNs) backdoor detection problem by proposing a new representation called one-pixel signature. Our task is to detect/classify if CNN model has been maliciously inserted with an unknown Trojan trigger or not. Here, each associated signature that created generating, pixel-by-pixel, adversarial value result of largest change class prediction. The agnostic design choice architectures, and how they were trained. It can be computed efficiently for...

10.48550/arxiv.2008.07711 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Information-Theoretic Local Minima Characterization and Regularization

OPENALEX - Publications

Zhiwei Jia Hao Su

Recent advances in deep learning theory have evoked the study of generalizability across different local minima neural networks (DNNs). While current work focused on either discovering properties good or developing regularization techniques to induce minima, no approach exists that can tackle both problems. We achieve these two goals successfully a unified manner. Specifically, based observed Fisher information we propose metric strongly indicative and effectively applied as practical...

10.48550/arxiv.1911.08192 preprint EN other-oa arXiv (Cornell University) 2019-01-01

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

OPENALEX - Publications

Tao Yue Zhiwei Jia Runze Ma Shugong Xu

Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need extra module (context modeling module) to help CNN capture global dependencies solve inductive bias strengthen relationship features. Recently, transformer has been proposed as a promising network for context by self-attention mechanism, but one main...

10.3390/electronics10222780 article EN Electronics 2021-11-13

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

OPENALEX - Publications

Tongzhou Mu Jiayuan Gu Zhiwei Jia Hao Tang Hao Su

We study how to learn a policy with compositional generalizability. propose two-stage framework, which refactorizes high-reward teacher into generalizable student strong inductive bias. Particularly, we implement an object-centric GNN-based policy, whose input objects are learned from images through self-supervised learning. Empirically, evaluate our approach on four difficult tasks that require generalizability, and achieve superior performance compared baselines.

10.48550/arxiv.2011.00971 preprint EN other-oa arXiv (Cornell University) 2020-01-01

A Structured Scene-Oriented SLAM System Based on Low-Cost Multi-Sensor Fusion

OPENALEX - Publications

Chao Feng Zhiwei Jia Wupeng Zhang

Structured scenes are characterized by complex road conditions and poor GPS signals, map degradation of positioning accuracy often occur when robots build maps structured scenes. Aiming at the above problems, a low-cost multi-sensor-periodize fusion SLAM system is designed, which fuses four sensors, namely, 2D LIDAR, RBGD camera, inertial measurement unit, wheel odometer. A sub-echelon data processing session designed in motion initialization session, sensor multi-strategy selection method...

10.1109/crc60659.2023.10488573 article EN 2024-04-09

YOLOv8-SC:Target Detection Algorithm of Live Detection System for Tensioning Clamps in High-Voltage Transmission Lines

OPENALEX - Publications

Wupeng Zhang Zhiwei Jia Chao Feng Mengyuan Chen

The live detection system for tensioning clamps based on unmanned aerial vehicles is the development direction of routine inspections high-voltage transmission lines. in real-time complex environments fundament this system. Addressing problem, YOLOv8-SC proposed YOLOv8. Replacing original C2f module with new C2fG-Ghost module, and a GAM attention layer added to backbone network. A binocular 3D coordinate algorithm obtain relative position target. Experiments show that improved improves by...

10.1109/crc60659.2023.10488490 article EN 2024-04-09

Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward

OPENALEX - Publications

Zhiwei Jia Yuesong Nan Huixi Zhao Gengdai Liu

Recent research has shown that fine-tuning diffusion models (DMs) with arbitrary rewards, including non-differentiable ones, is feasible reinforcement learning (RL) techniques, enabling flexible model alignment. However, applying existing RL methods to timestep-distilled DMs challenging for ultra-fast ($\le2$-step) image generation. Our analysis suggests several limitations of policy-based such as PPO or DPO toward this goal. Based on the insights, we propose learned differentiable surrogate...

10.48550/arxiv.2411.15247 preprint EN arXiv (Cornell University) 2024-11-22

Controllable Top-down Feature Transformer

OPENALEX - Publications

Zhiwei Jia Haoshen Hong Siyang Wang Kwonjoon Lee Zhuowen Tu

We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control. To this end, we develop transformer (TFT), under controllable parameters, that are able to account for hidden layer while maintaining overall consistency layers. The learned generators capture underlying processes independent particular training images. Our proposed TFT framework brings insights and helps understanding of, an important problem studying CNN internal...

10.48550/arxiv.1712.02400 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Tracking Based Semi-Automatic Annotation for Scene Text Videos

OPENALEX - Publications

Jiajun Zhu Xiufeng Jiang Zhiwei Jia Shugong Xu Shan Cao

Recently, video scene text detection has received increasing attention due to its comprehensive applications. However, the lack of annotated datasets become one most important problems, which hinders development detection. The existing are not large-scale expensive cost caused by manual labeling. In addition, instances in these too clear be a challenge. To address above issues, we propose tracking based semi-automatic labeling strategy for videos this paper. We get annotation manually first...

10.1109/access.2021.3066601 article EN cc-by-nc-nd IEEE Access 2021-01-01

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

OPENALEX - Publications

Yizhou Zhao Kaixiang Lin Zhiwei Jia Qiaozi Gao Govind Thattai and 2 more

Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators Embodied AI (EAI) challenges only provide simulated indoor with limited layouts. This paper presents Luminous, the first research framework employs state-of-the-art scene synthesis algorithms to generate large-scale challenges. Further, we automatically quantitatively evaluate quality...

10.48550/arxiv.2111.05527 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics

OPENALEX - Publications

Zhiwei Jia Bodi Yuan Kangkang Wang Hong Wu David Clifford and 2 more

10.48550/arxiv.2012.04932 preprint EN cc-by arXiv (Cornell University) 2020-01-01

Autonomous Learning and Navigation of Mobile Robots Based on Deep Reinforcement Learning

OPENALEX - Publications

Zhiqiang Lai Zhiwei Jia Man Chen

Abstract Aiming at the problems of convergence difficulties faced by deep reinforcement learning algorithms in dynamic pedestrian environments, and insufficient reward feedback mechanisms, a data-driven model-driven navigation algorithm which named GRRL has been proposed. In order to enrich perfect mechanism, we designed function. The function fully considers relationship between robot target position. It mainly includes three parts. experimental results show that autonomous efficiency...

10.1088/1742-6596/2171/1/012024 article EN Journal of Physics Conference Series 2022-01-01

Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness

OPENALEX - Publications

Zhiwei Jia Shugong Xu Shiyi Mu Tao Yue

The quality of the input text image has a clear impact on output scene recognition (STR) system; however, due to fact that main content is sequence characters containing semantic information, how effectively assess remains research challenge. Text assessment (TIQA) can help in picking hard sample, leading more robust STR system and recognition-oriented restoration. In this paper, by arguing comes from character-level texture feature embedding robustness, we propose learning-based...

10.3390/electronics11101611 article EN Electronics 2022-05-18

Dataset and Network Structure: Towards Frames Selection for Fast Video Deblurring

OPENALEX - Publications

Abdelwahed Nahli Shan Cao Zhiwei Jia Runze Ma Shugong Xu

Beyond the underlaying unrealistic presumptions in existing video deblurring datasets and algorithms which presume that a naturally blurred is fully blurred. In this work, we define more realistic frames averaging-based data degradation model by referring to as partially sequence, use it build REBVIDS, novel dataset close gap between synthetically training data, address most shortcomings of datasets. We also present DeblurNet, two phases training-based deep learning for deblurring, consists...

10.1109/access.2021.3074199 article EN cc-by IEEE Access 2021-01-01

Coming Soon ...