- Advanced Image and Video Retrieval Techniques
- Visual perception and processing mechanisms
- Speech Recognition and Synthesis
- Video Surveillance and Tracking Methods
- Advanced Neural Network Applications
- Face and Expression Recognition
- Process Optimization and Integration
- Natural Language Processing Techniques
- Advanced Control Systems Optimization
- Visual Attention and Saliency Detection
- Speech and Audio Processing
- Music and Audio Processing
- Topic Modeling
- Remote-Sensing Image Classification
- Domain Adaptation and Few-Shot Learning
- Image Retrieval and Classification Techniques
- Tactile and Sensory Interactions
- Action Observation and Synchronization
- Advanced Vision and Imaging
- Image Processing Techniques and Applications
- Human Pose and Action Recognition
- Retinal Imaging and Analysis
- Motor Control and Adaptation
- UAV Applications and Optimization
- Distributed Control Multi-Agent Systems
Sun Yat-sen University
2015-2025
Changzhou University
2023-2025
Microsoft (United States)
2025
Digital China Health (China)
2024
Tianjin University of Technology and Education
2013-2023
Microsoft Research (United Kingdom)
2023
Xiamen University
2022
NetApp (United States)
2021
Shanxi University
2021
Tianjin University
2014-2020
Conformer, combining convolution and self-attention sequentially to capture both local global information, has shown remarkable performance is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating but they not managed match Conformer's performance. The recently introduced Branchformer achieves comparable Conformer by using dedicated branches of merging context from each branch. In this paper, we propose...
Deep neural networks (DNNs) have now demonstrated state-of-the-art detection performance on pedestrian datasets. However, because of their high computational complexity, efficiency is still a frustrating problem even with the help Graphics Processing Units (GPUs). To improve efficiency, this paper proposes to share features across group DNNs that correspond models different sizes. By sharing features, burden for extracting from an image pyramid can be significantly reduced. Simultaneously,...
The inversion effect in biological motion suggests that presenting a point-light display (PLD) an inverted orientation impairs the observer’s ability to perceive movement, likely due unfamiliarity with dynamic characteristics of motion. Vertical dancers (VDs), accustomed performing and perceiving others perform dance movements while being suspended air, offer unique perspective on this phenomenon. A previous study showed VDs were more sensitive artificial PLDs depicting when compared typical...
Since tert-butyl acetate and n-heptane can form binary azeotropes, in this study, two special distillation methods of extractive (ED) improved side-stream were used to separate n-heptane. First, considering the properties molecular bond energy, relative volatility, azeotrope formation, n-methylpyrrolidone (NMP) was selected as most suitable solvent. Subsequently, missing interaction parameters for N-HEP-01/T-BUT-TE NMP/T-BUT-TE obtained by a vapor–liquid equilibrium experiment. Furthermore,...
Subspace learning is the process of finding a proper feature subspace and then projecting high-dimensional data onto learned low-dimensional subspace. The projection operation requires many floating-point multiplications additions, which makes computationally expensive. To tackle this problem, paper proposes two <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">simple-but-effective</i> fast image methods, Haar transform (FHT) based principal...
Moving object detection is a key to intelligent video analysis. On the one hand, what moves are not only interesting objects but also noise and cluttered background. other moving without rich texture prone be detected. Therefore, there undesirable false alarms missed in results of many algorithms detection. To reduce alarms, this paper we propose incorporate saliency map into an incremental subspace analysis framework which makes estimated background have less chance than foreground (i.e.,...
Diabetic retinal image classification aims to conduct diabetic retinopathy automatically diagnosing, which has achieved considerable improvement by deep learning models. However, these methods all rely on sufficient network training large scale annotated data, is very labor-expensive in medical labeling. Aiming overcome drawbacks, this paper focuses embedding self-supervised framework into unsupervised architecture. Specifically, we propose a Self-supervised Fuzzy Clustering Network (SFCN)...
This paper proposes multistream CNN, a novel neural network architecture for robust acoustic modeling in speech recognition tasks. The proposed processes input with diverse temporal resolutions by applying different dilation rates to convolutional networks across multiple streams achieve the robustness. are selected from multiples of sub-sampling rate 3 frames. Each stream stacks TDNN-F layers (a variant 1D CNN), and output embedding vectors concatenated then projected final layer. We...
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance its efficiency. Putting together all our observations, we introduce SEW-D (Squeezed Efficient Wav2vec with Disentangled Attention), significant improvements along efficiency dimensions across variety training setups. For example, under 100h-960h semi-supervised...
Diabetic Retinopathy (DR) causes a significant health threat to the patient's vision with diabetic disease, which may result in blindness severe situations. Various automatic DR diagnosis models have been proposed along development of deep learning, while there always relies on large scale annotated data train network. However, annotating medical fundus images is cost-expensive and requires well-trained professional doctors identity grades. To overcome this drawback, paper focuses utilizing...