Yinpeng Chen

ORCID: 0000-0003-1411-225X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Stroke Rehabilitation and Recovery
  • Generative Adversarial Networks and Image Synthesis
  • Virtual Reality Applications and Impacts
  • Adversarial Robustness in Machine Learning
  • Video Surveillance and Tracking Methods
  • Brain Tumor Detection and Classification
  • COVID-19 diagnosis using AI
  • Seismic Imaging and Inversion Techniques
  • Advanced Image and Video Retrieval Techniques
  • Augmented Reality Applications
  • Machine Learning and Data Classification
  • Machine Learning and ELM
  • Anomaly Detection Techniques and Applications
  • Sparse and Compressive Sensing Techniques
  • Robotics and Sensor-Based Localization
  • Human Motion and Animation
  • Tactile and Sensory Interactions
  • Seismic Waves and Analysis
  • Drilling and Well Engineering
  • Digital Media Forensic Detection
  • Gait Recognition and Analysis

Zhejiang University
2024-2025

Google (United States)
2024

Microsoft Research (United Kingdom)
2019-2023

Microsoft (Germany)
2022-2023

Los Alamos National Laboratory
2023

Huazhong University of Science and Technology
2021-2023

Adrian College
2023

Directorate of Medicinal and Aromatic Plants Research
2023

Microsoft (United States)
2013-2022

Istituto Tecnico Industriale Alessandro Volta
2021

Modern machine learning suffers from \textit{catastrophic forgetting} when new classes incrementally. The performance dramatically degrades due to the missing data of old classes. Incremental methods have been proposed retain knowledge acquired classes, by using distilling and keeping a few exemplars However, these struggle \textbf{scale up large number classes}. We believe this is because combination two factors: (a) imbalance between (b) increasing visually similar Distinguishing an...

10.1109/cvpr.2019.00046 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and width channels) CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing network or width. Instead using single kernel per layer, dynamic aggregates multiple parallel kernels dynamically based upon...

10.1109/cvpr42600.2020.01104 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The complex nature of combining localization and classification in object detection has resulted the flourished development methods. Previous works tried to improve performance various heads but failed present a unified view. In this paper, we novel dynamic head framework unify with attentions. By coherently multiple self-attention mechanisms between feature levels for scale-awareness, among spatial locations spatial-awareness, within output channels task-awareness, proposed approach...

10.1109/cvpr46437.2021.00729 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Two head structures (i.e. fully connected and convolution head) have been widely used in R-CNN based detectors for classification localization tasks. However, there is a lack of understanding how does these two work To address this issue, we perform thorough analysis find an interesting fact that the opposite preferences towards Specifically, (fc-head) more suitable task, while (conv-head) task. Furthermore, examine output feature maps both heads fc-head has spatial sensitivity than...

10.1109/cvpr42600.2020.01020 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We present Mobile-Former, a parallel design of MobileNet and transformer with two-way bridge in between. This structure leverages the advantages at local processing global interaction. And enables bidirectional fusion features. Different from recent works on vision transformer, Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn priors, resulting low computational cost. Combining proposed light-weight cross attention model bridge, is not...

10.1109/cvpr52688.2022.00520 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this paper, we present a novel Dynamic DETR (Detection with Transformers) approach by introducing dynamic attentions into both the encoder and decoder stages of to break its two limitations on small feature resolution slow training convergence. To address first limitation, which is due quadratic computational complexity self-attention module in Transformer encoders, propose approximate encoder's attention mechanism using convolution-based various types. Such an can dynamically adjust...

10.1109/iccv48922.2021.00298 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

This paper studies the BERT pretraining of video transformers. It is a straightforward but worth-studying extension given recent success from image We introduce BEVT which decouples representation learning into spatial and temporal dynamics learning. In particular, first performs masked modeling on data, then conducts jointly with data. design motivated by two observations: 1) transformers learned datasets provide decent priors that can ease transformers, are often times...

10.1109/cvpr52688.2022.01432 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Benefiting from masked visual modeling, self-supervised video representation learning has achieved remarkable progress. However, existing methods focus on representations scratch through reconstructing low-level features like raw pixel values. In this paper, we propose distillation (MVD), a simple yet effective two-stage feature modeling framework for learning: firstly pretrain an image (or video) model by recovering of patches, then use the resulting as targets modeling. For choice teacher...

10.1109/cvpr52729.2023.00611 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

In this paper, we address the incremental classifier learning problem, which suffers from catastrophic forgetting. The main reason for forgetting is that past data are not available during learning. Typical approaches keep some exemplars classes and use distillation regularization to retain classification capability on balance new classes. However, there four problems with these approaches. First, loss function efficient classification. Second, unbalance problem between Third, size of...

10.48550/arxiv.1802.00853 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Recent works of multi-source domain adaptation focus on learning a domain-agnostic model, which the parameters are static. However, such static model is difficult to handle conflicts across multiple domains, and suffers from performance degradation in both source domains target domain. In this paper, we present dynamic transfer address conflicts, where adapted samples. The key insight that adapting achieved via Thus, it breaks down barriers turns into single-source This also simplifies...

10.1109/cvpr46437.2021.01085 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

This paper aims at addressing the problem of substantial performance degradation extremely low computational cost (e.g. 5M FLOPs on ImageNet classification). We found that two factors, sparse connectivity and dynamic activation function, are effective to improve accuracy. The former avoids significant reduction network width, while latter mitigates detriment in depth. Technically, we propose micro-factorized convolution, which factorizes a convolution matrix into rank matrices, integrate...

10.1109/iccv48922.2021.00052 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

In this paper, we present a novel approach to model 3D human body with variations on both shape and pose, by exploring tensor decomposition technique. modeling is important for reconstruction animation of realistic body, which can be widely used in Tele-presence video game applications. It challenging due wide range over different people poses. The existing SCAPE popular computer vision body. However, it considers pose deformations separately, not accurate since deformation person-dependent....

10.1109/cvpr.2013.21 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Supervised deep learning with pixel-wise training labels has great successes on multi-person part segmentation. However, data labeling at pixel-level is very expensive. To solve the problem, people have been exploring to use synthetic avoid labeling. Although it easy generate for data, results are much worse compared those using real and manual The degradation of performance mainly due domain gap, i.e., discrepancy pixel value statistics between data. In this paper, we observe that humans...

10.1109/tcsvt.2020.2995122 article EN IEEE Transactions on Circuits and Systems for Video Technology 2020-05-16

This paper presents a novel mixed reality rehabilitation system used to help improve the reaching movements of people who have hemiparesis from stroke. The provides real-time, multimodal, customizable, and adaptive feedback generated movement patterns subject's affected arm torso during grasp. is provided via innovative visual musical forms that present stimulating, enriched environment in which train subjects promote multimodal sensory-motor integration. A pilot study was conducted test...

10.1109/tnsre.2010.2055061 article EN IEEE Transactions on Neural Systems and Rehabilitation Engineering 2010-07-01

Recent research in dynamic convolution shows substantial performance boost for efficient CNNs, due to the adaptive aggregation of K static kernels. It has two limitations: (a) it increases number convolutional weights by K-times, and (b) joint optimization attention kernels is challenging. In this paper, we revisit from a new perspective matrix decomposition reveal key issue that applies over channel groups after projecting into higher dimensional latent space. To address issue, propose...

10.48550/arxiv.2103.08756 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Current state-of-the-art object detectors can have significant performance drop when deployed in the wild due to domain gaps with training data. Unsupervised Domain Adaptation (UDA) is a promising approach adapt for new domains/environments without any expensive label cost. Previous mainstream UDA works detection usually focused on image-level and/or feature-level adaptation by using adversarial learning methods. In this work, we show that such adversarial-based methods only reduce style...

10.1109/wacv51458.2022.00113 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Few existing interactive rehabilitation systems can effectively communicate multiple aspects of movement performance simultaneously, in a manner that appropriately adapts across various training scenarios. In order to address the need for such within stroke training, unified approach designing upper limb survivors has been developed and applied implementation an Adaptive Mixed Reality Rehabilitation (AMRR) System. The AMRR system provides computational evaluation multimedia feedback...

10.1186/1743-0003-8-54 article EN cc-by Journal of NeuroEngineering and Rehabilitation 2011-01-01

Background. Adaptive mixed reality rehabilitation (AMRR) is a novel integration of motion capture technology and high-level media computing that provides precise kinematic measurements engaging multimodal feedback for self-assessment during therapeutic task. Objective. We describe the first proof-of-concept study to compare outcomes AMRR traditional upper-extremity physical therapy. Methods. Two groups participants with chronic stroke received either month therapy (n = 11) or matched dosing...

10.1177/1545968312465195 article EN Neurorehabilitation and neural repair 2012-12-03

ImmerseBoard is a system for remote collaboration through digital whiteboard that gives participants 3D immersive experience, enabled only by an RGBD camera (Microsoft Kinect) mounted on the side of large touch display. Using processing depth images, life-sized rendering, and novel visualizations, emulates writing side-by-side physical whiteboard, or alternatively mirror. User studies involving three tasks show compared to standard video conferencing with provides quantitatively better...

10.1145/2702123.2702160 article EN 2015-04-17

We present Mobile-Former, a parallel design of MobileNet and transformer with two-way bridge in between. This structure leverages the advantages at local processing global interaction. And enables bidirectional fusion features. Different from recent works on vision transformer, Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn priors, resulting low computational cost. Combining proposed light-weight cross attention model bridge, is not...

10.48550/arxiv.2108.05895 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Abstract The burgeoning field of computational spectrometers is rapidly advancing, providing a pathway to highly miniaturized, on-chip systems for in-situ or portable measurements. performance these typically limited in its encoder section. response matrix largely compromised with redundancies, due the periodic intensity overly smooth responses. As such, inherent interdependence among physical size, resolution, and bandwidth spectral encoders poses challenge further miniaturization progress....

10.1038/s41377-024-01705-w article EN cc-by Light Science & Applications 2025-03-31

This paper presents a novel real-time, multi-modal biofeedback system for stroke patient therapy. The problem is important as traditional mechanisms of rehabilitation are monotonous, and do not incorporate detailed quantitative assessment recovery in addition to clinical schemes. We have been working on developing an experiential media that integrates task dependent physical therapy cognitive stimuli within interactive, multimodal environment. environment provides purposeful, engaging,...

10.1145/1180639.1180804 article EN Proceedings of the 30th ACM International Conference on Multimedia 2006-10-23
Coming Soon ...