Yichen Wei

ORCID: 0009-0003-4327-8459
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Face recognition and analysis
  • Advanced Vision and Imaging
  • Domain Adaptation and Few-Shot Learning
  • Hand Gesture Recognition Systems
  • Visual Attention and Saliency Detection
  • Face and Expression Recognition
  • Anomaly Detection Techniques and Applications
  • Autonomous Vehicle Technology and Safety
  • Advanced Image Processing Techniques
  • Multimodal Machine Learning Applications
  • Image Enhancement Techniques
  • Forensic Anthropology and Bioarchaeology Studies
  • Internet Traffic Analysis and Secure E-voting
  • Robot Manipulation and Learning
  • Numerical methods for differential equations
  • Network Security and Intrusion Detection
  • Advanced Numerical Methods in Computational Mathematics
  • Robotics and Sensor-Based Localization
  • Differential Equations and Numerical Methods
  • Industrial Vision Systems and Defect Detection
  • Generative Adversarial Networks and Image Synthesis

South China Agricultural University
2024

Shanghai Jiao Tong University
2023-2024

Tianjin University
2024

Southeast University
2023

Megvii (China)
2019-2022

Vi Technology (United States)
2019-2021

University of Hong Kong
2004-2021

Shanghai Normal University
2019-2021

Fudan University
2020

Microsoft Research Asia (China)
2010-2018

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due the fixed structures in their building modules. In this work, we introduce two new modules enhance transformation modeling capability of CNNs, namely, deformable convolution and RoI pooling. Both based on idea augmenting spatial sampling locations with additional offsets learning from target tasks, without supervision. The can readily replace plain counterparts existing CNNs be easily trained...

10.1109/iccv.2017.89 article EN 2017-10-01

Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence the idea working in deep learning era. All state-of-the-art detection systems still rely on recognizing instances individually, without exploiting their during learning. This work proposes an relation module. It processes a set of simultaneously through interaction appearance feature and geometry, thus allowing relations. lightweight in-place. does require...

10.1109/cvpr.2018.00378 article EN 2018-06-01

Recent progresses in salient object detection have exploited the boundary prior, or background information, to assist other saliency cues such as contrast, achieving state-of-the-art results. However, their usage of prior is very simple, fragile, and integration with mostly heuristic. In this work, we present new methods address these issues. First, propose a robust measure, called connectivity. It characterizes spatial layout image regions respect boundaries much more robust. has an...

10.1109/cvpr.2014.360 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

10.1007/s11263-013-0667-3 article EN International Journal of Computer Vision 2013-12-12

This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class $s_p$ and minimize between-class $s_n$. We find majority of loss functions, including triplet softmax cross-entropy loss, embed $s_n$ into pairs seek reduce $(s_n-s_p)$. Such an manner is inflexible, because penalty strength every single score restricted be equal. Our intuition that if deviates far from optimum, it should emphasized. To this end, we simply re-weight each...

10.1109/cvpr42600.2020.00643 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art videos as per-frame evaluation too slow and unaffordable. We present deep feature flow, a fast accurate framework for video recognition. It runs expensive sub-network only sparse key frames propagates their maps other via flow field. achieves significant speedup computation relatively fast. The end-to-end training of whole architecture...

10.1109/cvpr.2017.441 article EN 2017-07-01

Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers degenerated appearances in videos, e.g., motion blur, defocus, rare poses, etc. Existing work attempts exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for detection. It leverages coherence level instead. improves the per-frame features by aggregation...

10.1109/iccv.2017.52 article EN 2017-10-01

In this paper, we study the task of 3D human pose estimation in wild. This is challenging due to lack training data, as existing datasets are either wild images with 2D or lab pose.,, We propose a weakly-supervised transfer learning method that uses mixed and labels unified deep neutral network presents two-stage cascaded structure. Our augments state-of-the-art sub-network depth regression sub-network. Unlike previous two stage approaches train sub-networks sequentially separately, our...

10.1109/iccv.2017.51 article EN 2017-10-01

Regression based methods are not performing as well detection for human pose estimation. A central problem is that the structural information in exploited previous regression methods. In this work, we propose a structure-aware approach. It adopts reparameterized representation using bones instead of joints. exploits joint connection structure to define compositional loss function encodes long range interactions pose. simple, effective, and general both 2D 3D estimation unified setting....

10.1109/iccv.2017.284 preprint EN 2017-10-01

We extends the previous 2D cascaded object pose regression work [9] in two aspects so that it works better for 3D articulated objects. Our first contribution is pose-indexed features generalize parameterized and achieve invariance to transformations. second a principled hierarchical adapted structure. It therefore more accurate faster. Comprehensive experiments verify state-of-the-art accuracy efficiency of proposed approach on challenging hand estimation problem, public dataset our new dataset.

10.1109/cvpr.2015.7298683 article EN 2015-06-01

We present a very efficient, highly accurate, "Explicit Shape Regression" approach for face alignment. Unlike previous regression-based approaches, we directly learn vectorial regression function to infer the whole facial shape (a set of landmarks) from image and explicitly minimize alignment errors over training data. The inherent constraint is naturally encoded into regressor in cascaded learning framework applied coarse fine during test, without using fixed parametric model as most...

10.1109/cvpr.2012.6248015 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2012-06-01

Modeling data uncertainty is important for noisy images, but seldom explored face recognition. The pioneer work, PFE, considers by modeling each image embedding as a Gaussian distribution. It quite effective. However, it uses fixed feature (mean of the Gaussian) from an existing model. only estimates variance and relies on ad-hoc costly metric. Thus, not easy to use. unclear how affects learning. This work applies learning recognition, such that (mean) (variance) are learnt simultaneously,...

10.1109/cvpr42600.2020.00575 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We revisit the one-shot Neural Architecture Search (NAS) paradigm and analyze its advantages over existing NAS approaches. Existing method, however, is hard to train not yet effective on large scale datasets like ImageNet. This work propose a Single Path One-Shot model address challenge in training. Our central idea construct simplified supernet, where all architectures are single paths so that weight co-adaption problem alleviated. Training performed by uniform path sampling. All (and their...

10.48550/arxiv.1904.00420 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper considers vehicle re-identification (re-ID) problem. The extreme viewpoint variation (up to 180 degrees) poses great challenges for existing approaches. Inspired by the behavior in human's recognition process, we propose a novel viewpoint-aware metric learning approach. It learns two metrics similar viewpoints and different feature spaces, respectively, giving rise network (VANet). During training, types of constraints are applied jointly. inference, is firstly estimated...

10.1109/iccv.2019.00837 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Previous learning based hand pose estimation methods does not fully exploit the prior information in model geometry. Instead, they usually rely a separate fitting step to generate valid poses. Such post processing is inconvenient and sub-optimal. In this work, we propose deep approach that adopts forward kinematics layer ensure geometric validity of estimated For first time, show embedding such non-linear generative process feasible for estimation. Our verified on challenging public datasets...

10.48550/arxiv.1606.06854 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Random forest is well known as one of the best learning methods. In spite its great success, it also has certain drawbacks: heuristic rule does not effectively minimize global training loss; model size usually too large for many real applications. To address issues, we propose two techniques, refinement and pruning, to improve a pre-trained random forest. The proposed jointly relearns leaf nodes all trees under objective function so that complementary information between multiple exploited....

10.1109/cvpr.2015.7298672 article EN 2015-06-01

10.1016/j.cviu.2018.10.006 article EN Computer Vision and Image Understanding 2018-11-01
Coming Soon ...