Zhenyu Xie

ORCID: 0000-0001-9207-1014
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • 3D Shape Modeling and Analysis
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Computer Graphics and Visualization Techniques
  • Magnetic Bearings and Levitation Dynamics
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Tribology and Lubrication Engineering
  • Human Motion and Animation
  • Industrial Technology and Control Systems
  • Image Enhancement Techniques
  • Visual Attention and Saliency Detection
  • Advanced Image Processing Techniques
  • Face recognition and analysis
  • Anomaly Detection Techniques and Applications
  • Martial Arts: Techniques, Psychology, and Education
  • Image and Video Quality Assessment
  • Polynomial and algebraic computation
  • Advanced machining processes and optimization
  • Aesthetic Perception and Analysis
  • IoT-based Smart Home Systems
  • Advanced Optimization Algorithms Research
  • Gear and Bearing Dynamics Analysis
  • Hand Gesture Recognition Systems
  • Numerical Methods and Algorithms

Sun Yat-sen University
2019-2025

Shanghai Jiao Tong University
2020-2024

University of Electronic Science and Technology of China
2015

Nanjing University of Aeronautics and Astronautics
2008-2014

Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem great application value. Existing works often treat it as a general inpainting task do not fully leverage the semantic structural information in images. Moreover, they directly utilize conventional convolution normalization layers restore incomplete image, tends wash away sketch information. In this paper, we propose novel Fashion Editing Generative...

10.1109/cvpr42600.2020.00814 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Virtual 3D try-on can provide an intuitive and realistic view for online shopping has a huge potential commercial value. However, existing virtual methods mainly rely on annotated human shapes garment templates, which hinders their applications in practical scenarios. 2D approaches faster alternative to manipulate clothed humans, but lack the rich representation. In this paper, we propose novel Monocular-to-3D Try-On Network (M3D-VTON) that builds merits of both approaches. By integrating...

10.1109/iccv48922.2021.01299 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

While significant progress has been made in garment transfer, one of the most applicable directions human-centric image generation, existing works overlook in-the-wild imagery, presenting severe garment-person mis-alignment as well noticeable degradation fine texture details. This paper, therefore, attends to virtual try-on real-world scenes and brings essential improvements authenticity naturalness especially for loose (e.g., skirts, formal dresses), challenging poses cross arms, bent...

10.1109/cvpr52688.2022.00347 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Text-guided motion synthesis aims to generate 3D human that not only precisely reflects the textual description but reveals details as much possible. Pioneering methods explore diffusion model for text-to-motion and obtain significant superiority. However, these conduct processes either on raw data distribution or low-dimensional latent space, which typically suffer from problem of modality inconsistency detail-scarce. To tackle this problem, we propose a novel Basic-to-Advanced Hierarchical...

10.1609/aaai.v38i6.28443 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

In this article, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each such joint group with single body-part node. Such an operation recursively abstracts pose to coarser at multiple granularity levels. With...

10.1109/tvcg.2024.3352002 article EN IEEE Transactions on Visualization and Computer Graphics 2024-01-15

The image-based Person-to-Person (P2P) virtual try-on, involving the direct transfer of garments from one person to another, is most promising applications human-centric image generation. However, existing approaches struggle accurately learn clothing deformation when directly warping garment source pose onto target pose. To address this, we propose try-on via Garment UNwrapping, a novel framework dubbed as ViTon-GUN. Specifically, divide P2P task into two subtasks: Person-to-Garment (P2G)...

10.1109/tvcg.2025.3550776 article EN IEEE Transactions on Visualization and Computer Graphics 2025-01-01

Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different operations. In this paper, we address problem finding category-specific for the task via Neural Architecture Search (NAS). We introduce a NAS-Warping Module elaborately design bilevel hierarchical search space identify optimal network-level operation-level flow estimation...

10.1145/3474085.3475490 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Cross-modal fashion image synthesis has emerged as one of the most promising directions in generation domain due to vast untapped potential incorporating multiple modalities and wide range applications. To facilitate accurate generation, cross-modal methods typically rely on Contrastive Language-Image Pre-training (CLIP) align textual garment information. In this work, we argue that simply aligning texture information is not sufficient capture semantics visual therefore propose MaskCLIP....

10.1145/3503161.3548230 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Image-based virtual try-on is one of the most promising applications human-centric image generation due to its tremendous real-world potential. Yet, as approaches fit in-shop garments onto a target person, they require laborious and restrictive construction paired training dataset, severely limiting their scalability. While few recent works attempt transfer directly from person another, alleviating need collect datasets, performance impacted by lack (supervised) information. In particular,...

10.48550/arxiv.2111.10544 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Transferring human motion from a source to target person poses great potential in computer vision and graphics applications. A crucial step is manipulate sequential future while retaining the appearance characteristic. Previous work has either relied on crafted 3D models or trained separate model specifically for each person, which not scalable practice. This studies more general setting, we aim learn single parsimoniously transfer video any given only one image of named as Collaborative...

10.1109/tip.2021.3123549 article EN IEEE Transactions on Image Processing 2021-01-01

Abstract A motorized spindle supported by active magnetic bearings (AMBs) is generally used for ultra-high-speed machining. Iron loss of radial AMB very great owing to high rotation speed, and it will cause severe thermal deformation. The problem particularly serious on the occasion large power application, such as all electric aero-engine. In this study, a prototype five degree-of-freedom AMBs developed. Homopolar heteropolar are independently adopted bearings. influences two types dynamic...

10.1016/j.cja.2014.10.031 article EN cc-by-nc-nd Chinese Journal of Aeronautics 2014-10-18

Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person. While existing methods focus on warping the fit body pose, they often overlook synthesis quality around garment-skin boundary and realistic effects like wrinkles shadows warped garments. These limitations greatly reduce realism of generated results hinder practical application VITON techniques. Leveraging notable success diffusion-based models in cross-modal synthesis, some recent have ventured...

10.48550/arxiv.2312.03667 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The computational method of unconstrained optimization problem is an important research topic in the field numerical computation. It great significance to solve optimization. There are many ways that applied settle these questions, so we need choose a which owns much faster and less complex trait. Furthermore, order this rubs, paper presents comparative study common algorithms our approach used handle some concrete problems.

10.1109/iccwamtip.2015.7494013 article EN 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) 2015-12-01

The utilization of Large Language Models (LLMs) for the construction AI systems has garnered significant attention across diverse fields. extension LLMs to domain fashion holds substantial commercial potential but also inherent challenges due intricate semantic interactions in fashion-related generation. To address this issue, we developed a hierarchical system called Fashion Matrix dedicated editing photos by just talking. This facilitates prompt-driven tasks, encompassing garment or...

10.48550/arxiv.2307.13240 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Multi-Camera People Tracking (MCPT) is a crucial task in intelligent surveillance systems. However, it presents significant challenges due to issues such as heavy occlusion and variations appearance that arise from multiple camera perspectives congested scenarios. In this paper, we propose an effective system integrates both spatial-temporal information address these problems, consisting of three specially designed modules: (1) A Multi-Object (MOT) method minimizes ID-switch errors generates...

10.1109/cvprw59228.2023.00554 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

In this paper, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each such joint group with single body-part node. Such an operation recursively abstracts pose to coarser at multiple granularity levels. With...

10.48550/arxiv.2401.02142 preprint EN other-oa arXiv (Cornell University) 2024-01-01

10.1109/cvprw63382.2024.00694 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17

Image-based 3D Virtual Try-ON (VTON) aims to sculpt the human according person and clothes images, which is data-efficient (i.e., getting rid of expensive data) but challenging. Recent text-to-3D methods achieve remarkable improvement in high-fidelity generation, demonstrating its potential for virtual try-on. Inspired by impressive success personalized diffusion models (e.g., Dreambooth LoRA) 2D VTON, it straightforward VTON integrating personalization technique into diffusion-based...

10.48550/arxiv.2407.16511 preprint EN arXiv (Cornell University) 2024-07-23
Coming Soon ...