Wei Yin

ORCID: 0000-0002-4349-8297
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Vision and Imaging
  • Optical measurement and interference techniques
  • Image Processing Techniques and Applications
  • Advanced Image Processing Techniques
  • Robotics and Sensor-Based Localization
  • Image Enhancement Techniques
  • 3D Shape Modeling and Analysis
  • Advanced Neural Network Applications
  • Crystallization and Solubility Studies
  • X-ray Diffraction in Crystallography
  • Domain Adaptation and Few-Shot Learning
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Computer Graphics and Visualization Techniques
  • Video Surveillance and Tracking Methods
  • Autonomous Vehicle Technology and Safety
  • Industrial Vision Systems and Defect Detection
  • Caching and Content Delivery
  • Remote Sensing and LiDAR Applications
  • Human Pose and Action Recognition
  • 3D Surveying and Cultural Heritage
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Safety Systems Engineering in Autonomy
  • Erosion and Abrasive Machining
  • Human Motion and Animation

Chubu University
2025

The University of Adelaide
2019-2024

Dà-Jiāng Innovations Science and Technology (China)
2022-2023

Changhai Hospital
2023

Second Military Medical University
2023

Foshan University
2022

Nanjing University of Science and Technology
2022

Jilin University
2016

Nanjing University of Posts and Telecommunications
2015

Jilin Medical University
2013

Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress evaluation metrics such as the pixel-wise relative error, most neglect geometric constraints space. In this work, we show importance of high-order for prediction. By designing loss term that enforces one simple type constraints, namely, virtual normal directions determined by randomly sampled three points reconstructed space, can considerably improve...

10.1109/iccv.2019.00578 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Despite significant progress in monocular depth estimation the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due an unknown shift induced by shift-invariant reconstruction losses mixed-data prediction training, and possible camera focal length. We investigate this problem detail, propose a two-stage framework that first predicts up scale from single image, then use point cloud encoders predict missing length allow us realistic shape. In addition, we...

10.1109/cvpr46437.2021.00027 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of base encoder fused parallel branch that queries non-parametric external memory pre-encoded images and associated text snippets. apply the problem long-tail demonstrate significant improvement over previous state-of-the-art on Places365-LT iNaturalist-2018 (14.5% 6.7% respectively), despite using only training...

10.1109/cvpr52688.2022.00683 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation can only handle single camera model and unable perform mixed-data training due ambiguity. Meanwhile, SOTA trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover...

10.1109/iccv51070.2023.00830 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated dynamic object regions and occlusions. Consequently, existing methods show poor accuracy scenes, estimated map blurred at boundaries because they are usually occluded other views. In this paper, we propose SC-DepthV3 addressing challenges. Specifically, introduce an external pretrained model generating...

10.1109/tpami.2023.3322549 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-10-06

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things stuff”) in an image equally. However, not pixels are equal. Depth of objects plays crucial role object recognition localization. To date how to boost the prediction accuracy is rarely discussed. In this paper, we first analyze data distributions interaction background, then propose foreground-background...

10.1609/aaai.v34i07.6908 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress the evaluation metrics such as pixel-wise relative error, most neglect geometric constraints space. In this work, we show importance of high-order for prediction. By designing loss term that enforces simple constraint, namely, virtual normal directions determined by randomly sampled three points reconstructed space, significantly improve accuracy and...

10.1109/tpami.2021.3097396 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-08-10

Despite significant progress made in the past few years, challenges remain for depth estimation using a single monocular image. First, it is nontrivial to train metric-depth prediction model that can generalize well diverse scenes mainly due limited training data. Thus, researchers have built large-scale relative datasets are much easier collect. However, existing models often fail recover accurate 3D scene shapes unknown shift caused by with We tackle this problem here and attempt estimate...

10.1109/tpami.2022.3209968 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-10-05

Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated areas, leading to corrupted estimations. Many multi-frame methods handle areas by identifying them with explicit masks and compensating cues monocular represented as local or features. The improvements are limited due uncontrolled quality of underutilized benefits fusion two types cues. In...

10.1109/cvpr52729.2023.02063 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Adversarial robustness remains a significant challenge in deploying deep neural networks for real-world applications. While adversarial training is widely acknowledged as promising defense strategy, most existing studies primarily focus on balanced datasets, neglecting the fact that data often exhibit long-tailed distribution, which introduces substantial challenges to robustness. In this paper, we provide an in-depth analysis of context distributions and identify limitations current...

10.32388/9z1lyw preprint EN cc-by 2025-03-21

We present a method for depth estimation with monocular images, which can predict high-quality on diverse scenes up to an affine transformation, thus preserving accurate shapes of scene. Previous methods that metric often work well only specific In contrast, learning relative (information being closer or further) enjoy better generalization, the price failing recover geometric shape this work, we propose dataset and tackle dilemma, aiming transformation good generalization scenes. First...

10.48550/arxiv.2002.00569 preprint EN cc-by-nc-sa arXiv (Cornell University) 2020-01-01

The perceptual loss has been widely used as an effective term in image synthesis tasks including super-resolution [16], and style transfer [14]. It was believed that the success lies high-level feature representations extracted from CNNs pretrained with a large set of images. Here we reveal that, what matters is network structure instead trained weights. Without any learning, deep sufficient to capture dependencies between multiple levels variable statistics using layers CNNs. This insight...

10.1109/cvpr46437.2021.00538 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric depth and surface normal estimation from single images, critical accurate 3D recovery. Depth estimation, though complementary, present distinct challenges. State-of-the-art monocular methods achieve generalization through affine-invariant depths, but fail to recover real-world scale. Conversely, current techniques struggle with performance due insufficient labeled data. propose targeted solutions both...

10.1109/tpami.2024.3444912 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-08-16

10.1109/cvpr52733.2024.01882 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Although the physical mechanism of light trapping property butterfly wings is well understood, it remains a challenge to create artificial replicas these natural functional structures. Here, we synthesized SiO2 inverse replica structure in wing scales using method combining sol–gel process and subsequent selective etching. First, reflectance spectrum was taken measure reflectivity. Then, FESEM TEM were used observe coupling replicas. Afterwards, assisted by SEM data, 3D optimized models...

10.1039/c3nr01455j article EN Nanoscale 2013-01-01

Monocular visual odometry (VO) is an important task in robotics and computer vision. Thus far, how to build accurate robust monocular VO systems that can work well diverse scenarios remains largely unsolved. In this article, we propose a framework exploit depth estimation for improving VO. The core of our module with strong generalization capability scenes. It consists two separate working modes assist the localization mapping. With single image input, predicts relative help on accuracy....

10.1109/tro.2022.3164834 article EN IEEE Transactions on Robotics 2022-07-08

Road detection is a critically important task for self-driving cars. By employing LiDAR data, recent works have significantly improved the accuracy of road detection. Relying on sensors limits wide application those methods when only cameras are available. In this paper, we propose novel approach with RGB being input during inference. Specifically, exploit pseudo-LiDAR using depth estimation, and feature fusion network where learned information fused To further optimize structure improve...

10.1109/tcsvt.2022.3146305 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-01-24

Monocular depth prediction plays a crucial role in understanding 3D scene geometry. Although recent methods have achieved impressive progress evaluation metrics such as the pixel-wise relative error, most neglect geometric constraints space. In this work, we show importance of high-order for prediction. By designing loss term that enforces one simple type constraints, namely, virtual normal directions determined by randomly sampled three points reconstructed space, can considerably improve...

10.48550/arxiv.1907.12209 preprint EN cc-by-nc-sa arXiv (Cornell University) 2019-01-01

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from single image, which is crucial 3D recovery. While are geometrically related highly complimentary, they present distinct challenges. SoTA monocular methods achieve generalization by learning affine-invariant depths, cannot recover real-world metrics. Meanwhile, have limited performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions...

10.1109/tpami.2024.3444912 preprint EN arXiv (Cornell University) 2024-03-21
Coming Soon ...