- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Stroke Rehabilitation and Recovery
- Generative Adversarial Networks and Image Synthesis
- Virtual Reality Applications and Impacts
- Adversarial Robustness in Machine Learning
- Video Surveillance and Tracking Methods
- Brain Tumor Detection and Classification
- COVID-19 diagnosis using AI
- Seismic Imaging and Inversion Techniques
- Advanced Image and Video Retrieval Techniques
- Augmented Reality Applications
- Machine Learning and Data Classification
- Machine Learning and ELM
- Anomaly Detection Techniques and Applications
- Sparse and Compressive Sensing Techniques
- Robotics and Sensor-Based Localization
- Human Motion and Animation
- Tactile and Sensory Interactions
- Seismic Waves and Analysis
- Drilling and Well Engineering
- Digital Media Forensic Detection
- Gait Recognition and Analysis
Zhejiang University
2024-2025
Google (United States)
2024
Microsoft Research (United Kingdom)
2019-2023
Microsoft (Germany)
2022-2023
Los Alamos National Laboratory
2023
Huazhong University of Science and Technology
2021-2023
Adrian College
2023
Directorate of Medicinal and Aromatic Plants Research
2023
Microsoft (United States)
2013-2022
Istituto Tecnico Industriale Alessandro Volta
2021
Modern machine learning suffers from \textit{catastrophic forgetting} when new classes incrementally. The performance dramatically degrades due to the missing data of old classes. Incremental methods have been proposed retain knowledge acquired classes, by using distilling and keeping a few exemplars However, these struggle \textbf{scale up large number classes}. We believe this is because combination two factors: (a) imbalance between (b) increasing visually similar Distinguishing an...
Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and width channels) CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing network or width. Instead using single kernel per layer, dynamic aggregates multiple parallel kernels dynamically based upon...
The complex nature of combining localization and classification in object detection has resulted the flourished development methods. Previous works tried to improve performance various heads but failed present a unified view. In this paper, we novel dynamic head framework unify with attentions. By coherently multiple self-attention mechanisms between feature levels for scale-awareness, among spatial locations spatial-awareness, within output channels task-awareness, proposed approach...
Two head structures (i.e. fully connected and convolution head) have been widely used in R-CNN based detectors for classification localization tasks. However, there is a lack of understanding how does these two work To address this issue, we perform thorough analysis find an interesting fact that the opposite preferences towards Specifically, (fc-head) more suitable task, while (conv-head) task. Furthermore, examine output feature maps both heads fc-head has spatial sensitivity than...
We present Mobile-Former, a parallel design of MobileNet and transformer with two-way bridge in between. This structure leverages the advantages at local processing global interaction. And enables bidirectional fusion features. Different from recent works on vision transformer, Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn priors, resulting low computational cost. Combining proposed light-weight cross attention model bridge, is not...
In this paper, we present a novel Dynamic DETR (Detection with Transformers) approach by introducing dynamic attentions into both the encoder and decoder stages of to break its two limitations on small feature resolution slow training convergence. To address first limitation, which is due quadratic computational complexity self-attention module in Transformer encoders, propose approximate encoder's attention mechanism using convolution-based various types. Such an can dynamically adjust...
This paper studies the BERT pretraining of video transformers. It is a straightforward but worth-studying extension given recent success from image We introduce BEVT which decouples representation learning into spatial and temporal dynamics learning. In particular, first performs masked modeling on data, then conducts jointly with data. design motivated by two observations: 1) transformers learned datasets provide decent priors that can ease transformers, are often times...
Benefiting from masked visual modeling, self-supervised video representation learning has achieved remarkable progress. However, existing methods focus on representations scratch through reconstructing low-level features like raw pixel values. In this paper, we propose distillation (MVD), a simple yet effective two-stage feature modeling framework for learning: firstly pretrain an image (or video) model by recovering of patches, then use the resulting as targets modeling. For choice teacher...
In this paper, we address the incremental classifier learning problem, which suffers from catastrophic forgetting. The main reason for forgetting is that past data are not available during learning. Typical approaches keep some exemplars classes and use distillation regularization to retain classification capability on balance new classes. However, there four problems with these approaches. First, loss function efficient classification. Second, unbalance problem between Third, size of...
Recent works of multi-source domain adaptation focus on learning a domain-agnostic model, which the parameters are static. However, such static model is difficult to handle conflicts across multiple domains, and suffers from performance degradation in both source domains target domain. In this paper, we present dynamic transfer address conflicts, where adapted samples. The key insight that adapting achieved via Thus, it breaks down barriers turns into single-source This also simplifies...
This paper aims at addressing the problem of substantial performance degradation extremely low computational cost (e.g. 5M FLOPs on ImageNet classification). We found that two factors, sparse connectivity and dynamic activation function, are effective to improve accuracy. The former avoids significant reduction network width, while latter mitigates detriment in depth. Technically, we propose micro-factorized convolution, which factorizes a convolution matrix into rank matrices, integrate...
In this paper, we present a novel approach to model 3D human body with variations on both shape and pose, by exploring tensor decomposition technique. modeling is important for reconstruction animation of realistic body, which can be widely used in Tele-presence video game applications. It challenging due wide range over different people poses. The existing SCAPE popular computer vision body. However, it considers pose deformations separately, not accurate since deformation person-dependent....
Supervised deep learning with pixel-wise training labels has great successes on multi-person part segmentation. However, data labeling at pixel-level is very expensive. To solve the problem, people have been exploring to use synthetic avoid labeling. Although it easy generate for data, results are much worse compared those using real and manual The degradation of performance mainly due domain gap, i.e., discrepancy pixel value statistics between data. In this paper, we observe that humans...
This paper presents a novel mixed reality rehabilitation system used to help improve the reaching movements of people who have hemiparesis from stroke. The provides real-time, multimodal, customizable, and adaptive feedback generated movement patterns subject's affected arm torso during grasp. is provided via innovative visual musical forms that present stimulating, enriched environment in which train subjects promote multimodal sensory-motor integration. A pilot study was conducted test...
Recent research in dynamic convolution shows substantial performance boost for efficient CNNs, due to the adaptive aggregation of K static kernels. It has two limitations: (a) it increases number convolutional weights by K-times, and (b) joint optimization attention kernels is challenging. In this paper, we revisit from a new perspective matrix decomposition reveal key issue that applies over channel groups after projecting into higher dimensional latent space. To address issue, propose...
Current state-of-the-art object detectors can have significant performance drop when deployed in the wild due to domain gaps with training data. Unsupervised Domain Adaptation (UDA) is a promising approach adapt for new domains/environments without any expensive label cost. Previous mainstream UDA works detection usually focused on image-level and/or feature-level adaptation by using adversarial learning methods. In this work, we show that such adversarial-based methods only reduce style...
Few existing interactive rehabilitation systems can effectively communicate multiple aspects of movement performance simultaneously, in a manner that appropriately adapts across various training scenarios. In order to address the need for such within stroke training, unified approach designing upper limb survivors has been developed and applied implementation an Adaptive Mixed Reality Rehabilitation (AMRR) System. The AMRR system provides computational evaluation multimedia feedback...
Background. Adaptive mixed reality rehabilitation (AMRR) is a novel integration of motion capture technology and high-level media computing that provides precise kinematic measurements engaging multimodal feedback for self-assessment during therapeutic task. Objective. We describe the first proof-of-concept study to compare outcomes AMRR traditional upper-extremity physical therapy. Methods. Two groups participants with chronic stroke received either month therapy (n = 11) or matched dosing...
ImmerseBoard is a system for remote collaboration through digital whiteboard that gives participants 3D immersive experience, enabled only by an RGBD camera (Microsoft Kinect) mounted on the side of large touch display. Using processing depth images, life-sized rendering, and novel visualizations, emulates writing side-by-side physical whiteboard, or alternatively mirror. User studies involving three tasks show compared to standard video conferencing with provides quantitatively better...
We present Mobile-Former, a parallel design of MobileNet and transformer with two-way bridge in between. This structure leverages the advantages at local processing global interaction. And enables bidirectional fusion features. Different from recent works on vision transformer, Mobile-Former contains very few tokens (e.g. 6 or fewer tokens) that are randomly initialized to learn priors, resulting low computational cost. Combining proposed light-weight cross attention model bridge, is not...
Abstract The burgeoning field of computational spectrometers is rapidly advancing, providing a pathway to highly miniaturized, on-chip systems for in-situ or portable measurements. performance these typically limited in its encoder section. response matrix largely compromised with redundancies, due the periodic intensity overly smooth responses. As such, inherent interdependence among physical size, resolution, and bandwidth spectral encoders poses challenge further miniaturization progress....
This paper presents a novel real-time, multi-modal biofeedback system for stroke patient therapy. The problem is important as traditional mechanisms of rehabilitation are monotonous, and do not incorporate detailed quantitative assessment recovery in addition to clinical schemes. We have been working on developing an experiential media that integrates task dependent physical therapy cognitive stimuli within interactive, multimodal environment. environment provides purposeful, engaging,...