Hongwei Yi

ORCID: 0000-0001-8669-2009
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Advanced Vision and Imaging
  • Infrared Target Detection Methodologies
  • Human Motion and Animation
  • 3D Shape Modeling and Analysis
  • Video Surveillance and Tracking Methods
  • Advanced Measurement and Detection Methods
  • Optical measurement and interference techniques
  • Advanced Neural Network Applications
  • Optical Systems and Laser Technology
  • Robotics and Sensor-Based Localization
  • Fixed Point Theorems Analysis
  • Computer Graphics and Visualization Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Optimization and Variational Analysis
  • Image Processing Techniques and Applications
  • Satellite Image Processing and Photogrammetry
  • Advanced Image Processing Techniques
  • Face recognition and analysis
  • Image Retrieval and Classification Techniques
  • Heavy metals in environment
  • Nonlinear Differential Equations Analysis
  • Remote Sensing and LiDAR Applications
  • 3D Surveying and Cultural Heritage
  • Structural Behavior of Reinforced Concrete

Max Planck Institute for Intelligent Systems
2022-2024

Xihua University
2024

Xi'an Institute of Optics and Precision Mechanics
2011-2022

Chinese Academy of Sciences
2006-2022

Peking University
2019-2021

University of Chinese Academy of Sciences
2020

Wuhan University
2020

Henan Polytechnic University
2017

Oil and Gas Center
2017

Hunan University
2014-2015

3D object detection from a single image without LiDAR is challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this because they fail capture local and its scale information, which vital detection. To better represent structure, prior arts typically transform maps estimated images into pseudo-LiDAR representation, then apply existing point-cloud based detectors. However, their results depend heavily on accuracy maps, resulting in...

10.1109/cvpr42600.2020.01169 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose shape (HPS) have enjoyed significant progress, reasoning about from a single image still challenging. Existing HSC detection methods consider only few types of predefined contact, often reduce body scene to small number primitives, even overlook evidence. To predict image, we address...

10.1109/cvpr52688.2022.01292 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences poses, hand gestures, and facial expressions that are realistic diverse. To achieve this, first build high-quality dataset meshes with synchronous We then define novel speech-to-motion generation framework in which face, body, hands modeled separately. The separated modeling stems fact face articulation strongly correlates speech, while poses gestures...

10.1109/cvpr52729.2023.00053 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces blurry texture. But how to effectively capture all visual attributes of individual which are sufficient reconstruct unseen areas (e.g. back view)? Motivated by power foundation models, TeCH reconstructs 3D human leveraging...

10.1109/3dv62453.2024.00152 article EN 2021 International Conference on 3D Vision (3DV) 2024-03-18

We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry lifelike textures, can be animated rendered traditional graphics pipelines. Existing text-based character generation methods are limited in terms of texture quality, cannot realistically due to the misalignment between texture, particularly face region. To address these limitations, TADA leverages synergy 2D diffusion model parametric body model....

10.1109/3dv62453.2024.00150 article EN 2021 International Conference on 3D Vision (3DV) 2024-03-18

Humans are in constant contact with the world as they move through it and interact it. This is a vital source of information for understanding 3D humans, scenes, interactions between them. In fact, we demonstrate that these human-scene (HSIs) can be leveraged to improve reconstruction scene from monocular RGB video. Our key idea that, person moves interacts it, accumulate HSIs across multiple input images, use optimizing reconstruct consistent, physically plausible, layout....

10.1109/cvpr52688.2022.00393 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Generating realistic 3D worlds occupied by moving humans has many applications in games, architecture, and synthetic data creation. But generating such scenes is expensive labor intensive. Recent work generates human poses motions given a scene. Here, we take the opposite approach generate indoor motion. Such can come from archival motion capture or IMU sensors worn on body, effectively turning movement into "scanner" of world. Intuitively, indicates free-space room contact surfaces objects...

10.1109/cvpr52729.2023.01246 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of high accuracy optimization-based methods and efficiency learning-based methods, we propose coarse-to-fine way to realize high-fidelity reconstruction (CAR) from single image. At first stage, use an implicit model learn general shape in canonical space person way, at second refine surface detail by estimating non-rigid deformation posed optimization way. A hyper-network is utilized...

10.1109/cvpr52729.2023.00837 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction wild. Employing head-mounted device integrated LiDAR and camera, we record 12 subjects' activities over 10 diverse scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D parameters, translations are provided, together reconstructed scene point clouds. To obtain accurate ground truth such...

10.1109/cvpr52729.2023.00073 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

3D object detection from a single image without LiDAR is challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this because they fail capture local and its scale information, which vital detection. To better represent structure, prior arts typically transform maps estimated images into pseudo-LiDAR representation, then apply existing point-cloud based detectors. However, their results depend heavily on accuracy maps, resulting in...

10.1109/cvprw50498.2020.00508 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric network, named M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> VSNet, dense point cloud reconstruction without any...

10.1109/icip42928.2021.9506469 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2021-08-23

Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, leverage a learned prior from large-scale specific 3D datasets that reconstruction can be performed with sparse-view inputs. Most of these fail to achieve realistic when only single image is available. To enable the data-efficient creation anima table humans, we propose ELICIT, novel method learning human-specific radiance fields image. Inspired by...

10.1109/iccv51070.2023.00824 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

The regression of 3D Human Pose and Shape (HPS) from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or graphics. Yet, no regressor perfect, accuracy can be affected by ambiguous evidence poses appearance that are unseen during training. Most current HPS regressors, however, do not report confidence their outputs, meaning cannot differentiate accurate estimates inaccurate ones. To address this, we develop POCO, a...

10.1109/3dv62453.2024.00115 article EN 2021 International Conference on 3D Vision (3DV) 2024-03-18

3D vehicle detection based on point cloud is a challenging task in real-world applications such as autonomous driving. Despite significant progress has been made, we observe two aspects to be further improved. First, the semantic context information LiDAR seldom explored previous works, which may help identify ambiguous vehicles. Second, distribution of vehicles varies continuously with increasing depths, not well modeled by single model. In this work, propose unified model SegVoxelNet...

10.1109/icra40945.2020.9196556 article EN 2020-05-01

We propose to address the face reconstruction in wild by using a multi-metric regression network, MMFace, align 3D morphable model (3DMM) an input image. The key idea is utilize volumetric sub-network estimate intermediate geometry representation, and parametric regress 3DMM parameters. Our consists of identity loss, expression pose loss which greatly improves aligned details incorporating high level functions directly defined spaces. high-quality robust under large variations expressions,...

10.1109/cvpr.2019.00785 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

This paper focuses on the opportunity to use multiple star trackers help space situational awareness and surveillance. Catalogs of debris around Earth are usually based ground-based measurements, which rely data provided by radar observations optical observations. However, space-based offer new opportunities because they independent weather circadian rhythms ground system is subjected. Consequently, improve possibility detection cataloging. work deals with a feasibility study an innovative...

10.3390/app12073593 article EN cc-by Applied Sciences 2022-04-01

In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency. The key idea is simple: factorize the text-to-video task into two separate easier tasks for diffusion step distillation, namely text-to-image image-to-video generation. We verify that same optimization algorithm, indeed to converge over task. also explore a bag of tricks reduce computational cost training (I2V) models from three aspects:...

10.48550/arxiv.2502.07701 preprint EN arXiv (Cornell University) 2025-02-11

We present a high-accuracy, low false-alarm rate, and computational-cost methodology for removing stars noise detecting space debris with signal-to-noise ratio (SNR) in optical image sequences. First, time-index filtering bright star intensity enhancement are implemented to remove effectively. Then, multistage quasi-hypothesis-testing method is proposed detect the pieces of continuous discontinuous trajectories. For this purpose, defined generated. Experimental results show that can...

10.1364/ao.55.007929 article EN Applied Optics 2016-09-23

3D object detection from a single image without LiDAR is challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this because they fail capture local and its scale information, which vital detection. To better represent structure, prior arts typically transform maps estimated images into pseudo-LiDAR representation, then apply existing point-cloud based detectors. However, their results depend heavily on accuracy maps, resulting in...

10.48550/arxiv.1912.04799 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...