Zhiyu Pan

ORCID: 0000-0001-5584-6669
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Advanced Vision and Imaging
  • Gait Recognition and Analysis
  • Visual Attention and Saliency Detection
  • Image Enhancement Techniques
  • Biometric Identification and Security
  • 3D Shape Modeling and Analysis
  • Advanced Image Processing Techniques
  • Face recognition and analysis
  • Image Processing and 3D Reconstruction
  • Generative Adversarial Networks and Image Synthesis
  • Anomaly Detection Techniques and Applications
  • Image and Video Quality Assessment
  • Industrial Vision Systems and Defect Detection
  • Aesthetic Perception and Analysis
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Remote Sensing and LiDAR Applications
  • Neural Networks and Applications
  • Advanced MEMS and NEMS Technologies
  • Forensic Fingerprint Detection Methods
  • Hand Gesture Recognition Systems
  • Autonomous Vehicle Technology and Safety
  • Image Processing Techniques and Applications

Tsinghua University
2021-2024

Huazhong University of Science and Technology
2020-2024

University of Würzburg
2023

Powerchina Huadong Engineering Corporation (China)
2022

RWTH Aachen University
2022

We present the new Bokeh Effect Transformation Dataset (BETD), and review proposed solutions for this novel task at NTIRE 2023 Challenge. Recent advancements of mobile photography aim to reach visual quality full-frame cameras. Now, a goal in computational is optimize effect itself, which aesthetic blur out-of-focus areas an image. Photographers create by benefiting from lens optical properties.The work design neural network capable converting one another without harming sharp foreground...

10.1109/cvprw59228.2023.00166 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends the whole image. In contrast, we propose constrain nonlocal augmentation within a pair of lines: each point only corresponding epipolar lines. Our idea takes inspiration...

10.1109/iccv51070.2023.01658 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Class-agnostic motion prediction methods aim to comprehend within open-world scenarios, holding significance for autonomous driving systems. However, training a high-performance model in fully-supervised manner always requires substantial amounts of manually annotated data, which can be both expensive and time-consuming obtain. To address this challenge, our study explores the potential semi-supervised learning (SSL) class-agnostic prediction. Our SSL framework adopts consistency-based...

10.1609/aaai.v38i6.28358 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Cropping box regression algorithms re-frame the images with predicted cropping boxes for better composition quality, which can save considerable manpower and time massive image retouching work. Yet, recent learning-based require expert annotations, makes scale of training limited. This consequently incurs a performance bottleneck. To address this issue, previous works seek help from auxiliary datasets related tasks, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/tmm.2024.3377125 article EN IEEE Transactions on Multimedia 2024-01-01

We show that relation modeling between visual elements matters in cropping view recommendation. Cropping recommendation addresses the problem of image recomposition conditioned on composition quality and ranking views (cropped sub-regions). This task is challenging because difference subtle when a element reserved or removed. Existing methods represent by extracting region-based convolutional features inside outside boundaries, without probing fundamental question: why some are interest...

10.1109/iccv48922.2021.00418 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Temporal consistency is the key challenge of video depth estimation. Previous works are based on additional optical flow or camera poses, which time-consuming. By contrast, we derive with less information. Since videos inherently exist heavy temporal redundancy, a missing frame could be recovered from neighboring ones. Inspired by this, propose masking network (FMNet), spatial-temporal transformer predicting masked frames their frames. reconstructing features, FMNet can learn intrinsic...

10.1145/3503161.3547978 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Automatic image cropping algorithms aim to recompose images like human-being photographers by generating the boxes with improved composition quality. Cropping box regression approaches learn beauty of from annotated boxes. However, bias annotations leads quasi-trivial recomposing results, which has an obvious tendency average location training samples. The crux this predicament is that task naively treated as a problem, where rare samples might be dominated normal samples, and patterns are...

10.1609/aaai.v37i2.25293 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Sports analysis and viewing play a pivotal role in the current sports domain, offering significant value not only to coaches athletes but also fans media. In recent years, rapid development of virtual reality (VR) augmented (AR) technologies have introduced new platform for watching games. Visualization competitions VR/AR represents revolutionary technology, providing audiences with novel immersive experience. However, there is still lack related research this area. work, we present first...

10.48550/arxiv.2405.01112 preprint EN arXiv (Cornell University) 2024-05-02

Spiking neural networks (SNNs) are gaining popularity in deep learning due to their low energy budget on neuromorphic hardware. However, they still face challenges lacking sufficient robustness guard safety-critical applications such as autonomous driving. Many studies have been conducted defend SNNs from the threat of adversarial attacks. This paper aims uncover SNN through lens stability nonlinear systems. We inspired by fact that searching for parameters altering leaky integrate-and-fire...

10.48550/arxiv.2405.20694 preprint EN arXiv (Cornell University) 2024-05-31

In this paper, we present a novel registration framework, HumanReg, that learns non-rigid transformation between two human point clouds end-to-end. We introduce body prior into the process to efficiently handle type of cloud. Unlike most exsisting supervised techniques require expensive point-wise flow annotations, HumanReg can be trained in self-supervised manner benefiting from set loss functions. To make our model better converge on real-world data, also propose pretraining strategy, and...

10.1109/3dv62453.2024.00067 article EN 2021 International Conference on 3D Vision (3DV) 2024-03-18

10.1109/cvpr52733.2024.01651 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

10.1109/cvpr52733.2024.00731 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Bokeh effect transformation is a novel task in computer vision and computational photography. It aims to convert bokeh effects from one camera lens another. To this end, we introduce new concept of blur ratio, which represents the ratio amount target image that source image, propose framework SBTNet based on concept. For cat-eye simulation type transformation, two-channel coordinate map one-hot are added as extra inputs. The core sequence parallel FeaNets, along with feature selection...

10.1109/cvprw59228.2023.00150 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Abstract Upward ( phase i ) pipe jacking project of hangzhou water intake is near existing bridge foundation and operating tunnel. In order to investigate the influence construction process on structures, Plaxis 2D finite element software was used simulate horizontal vertical displacement structures caused by construction. The calculation results show that: pile nonlinear distribution. There a bending point in foundation. maximum minimum are 0.71 mm and-0.84 mm, respectively. settlement only...

10.1088/1742-6596/2230/1/012008 article EN Journal of Physics Conference Series 2022-03-01

Generative adversarial networks,(GANs) have been trained to be professional artists able create stunning artworks such as face generation and image style transfer. In this paper, we focus on a realistic business scenario: automated of customizable icons given desired mobile applications theme styles. We first introduce theme-application icon dataset, termed AppIcon, where each has two orthogonal app labels. By investigating strong baseline StyleGAN2, observe mode collapse caused by the...

10.1145/3503161.3548109 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Image cropping aims to enhance the aesthetic quality of a given image by searching for good views. One common routine is score and rank candidate views neural network. The network expected discriminate subtle view-wise differences. However, image-wise differences ambiguity in annotations render difficulties discriminating To focus on differences, we propose feature spliter build evaluate only based feature. Then, ranking gain loss that alleviates amplify remarkable improvement compared with...

10.1109/icip46576.2022.9897834 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2022-10-16

10.1109/tifs.2024.3516566 article EN IEEE Transactions on Information Forensics and Security 2024-01-01

Compared to traditional Artificial Neural Network (ANN), Spiking (SNN) has garnered widespread academic interest for its intrinsic ability transmit information in a more biological-inspired and energy-efficient manner. However, despite previous efforts optimize the learning gradients model structure of SNNs through various methods, still lag behind ANNs terms performance some extent. The recently proposed multi-threshold provides possibilities further enhancing capability SNNs. In this...

10.48550/arxiv.2402.00411 preprint EN arXiv (Cornell University) 2024-02-01

The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic prediction methods directly predict the entire point cloud. While most existing rely on fully-supervised learning, manual labeling cloud data is laborious and time-consuming. Therefore, several annotation-efficient have been proposed to address this challenge. Although effective, these weak annotations or additional multi-modal like images, potential...

10.48550/arxiv.2403.13261 preprint EN arXiv (Cornell University) 2024-03-19

Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial (ANNs). One of the mainstream approaches implementing deep SNNs is ANN-SNN conversion, which integrates efficient training strategy ANNs energy-saving potential fast inference capability SNNs. However, under extreme low-latency conditions, existing conversion theory suggests that problem misrepresentation residual membrane...

10.48550/arxiv.2404.17456 preprint EN arXiv (Cornell University) 2024-04-26

Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes geometry reconstruction and encoding 3D descriptors decoding views. However, existing methods show limited generalization ability in challenging conditions due inaccurate geometry, sub-optimal descriptors, strategies. We address these issues point by point. First, we find the volume exhibits failure patterns as features of pixels corresponding same can be...

10.48550/arxiv.2404.17528 preprint EN arXiv (Cornell University) 2024-04-26

Latent fingerprint matching is a daunting task, primarily due to the poor quality of latent fingerprints. In this study, we propose deep-learning based dense minutia descriptor (DMD) for matching. A DMD obtained by extracting patch aligned its central minutia, capturing detailed information and texture information. Our takes form three-dimensional representation, with two dimensions associated original image plane other dimension representing abstract features. Additionally, extraction...

10.48550/arxiv.2405.01199 preprint EN arXiv (Cornell University) 2024-05-02

Currently, portable electronic devices are becoming more and popular. For lightweight considerations, their fingerprint recognition modules usually use limited-size sensors. However, partial fingerprints have few matchable features, especially when there differences in finger pressing posture or image quality, which makes verification challenging. Most existing methods regard position rectification identity as independent tasks, ignoring the coupling relationship between them -- relative...

10.48550/arxiv.2405.03959 preprint EN arXiv (Cornell University) 2024-05-06
Coming Soon ...