Jing Pan

ORCID: 0000-0002-5178-2247
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Visual perception and processing mechanisms
  • Speech Recognition and Synthesis
  • Video Surveillance and Tracking Methods
  • Advanced Neural Network Applications
  • Face and Expression Recognition
  • Process Optimization and Integration
  • Natural Language Processing Techniques
  • Advanced Control Systems Optimization
  • Visual Attention and Saliency Detection
  • Speech and Audio Processing
  • Music and Audio Processing
  • Topic Modeling
  • Remote-Sensing Image Classification
  • Domain Adaptation and Few-Shot Learning
  • Image Retrieval and Classification Techniques
  • Tactile and Sensory Interactions
  • Action Observation and Synchronization
  • Advanced Vision and Imaging
  • Image Processing Techniques and Applications
  • Human Pose and Action Recognition
  • Retinal Imaging and Analysis
  • Motor Control and Adaptation
  • UAV Applications and Optimization
  • Distributed Control Multi-Agent Systems

Sun Yat-sen University
2015-2025

Changzhou University
2023-2025

Microsoft (United States)
2025

Digital China Health (China)
2024

Tianjin University of Technology and Education
2013-2023

Microsoft Research (United Kingdom)
2023

Xiamen University
2022

NetApp (United States)
2021

Shanxi University
2021

Tianjin University
2014-2020

10.1016/j.sigpro.2010.08.010 article EN Signal Processing 2010-09-16

Conformer, combining convolution and self-attention sequentially to capture both local global information, has shown remarkable performance is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating but they not managed match Conformer's performance. The recently introduced Branchformer achieves comparable Conformer by using dedicated branches of merging context from each branch. In this paper, we propose...

10.1109/slt54892.2023.10022656 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

Deep neural networks (DNNs) have now demonstrated state-of-the-art detection performance on pedestrian datasets. However, because of their high computational complexity, efficiency is still a frustrating problem even with the help Graphics Processing Units (GPUs). To improve efficiency, this paper proposes to share features across group DNNs that correspond models different sizes. By sharing features, burden for extracting from an image pyramid can be significantly reduced. Simultaneously,...

10.1016/j.neucom.2015.12.042 article EN cc-by-nc-nd Neurocomputing 2015-12-24

The inversion effect in biological motion suggests that presenting a point-light display (PLD) an inverted orientation impairs the observer’s ability to perceive movement, likely due unfamiliarity with dynamic characteristics of motion. Vertical dancers (VDs), accustomed performing and perceiving others perform dance movements while being suspended air, offer unique perspective on this phenomenon. A previous study showed VDs were more sensitive artificial PLDs depicting when compared typical...

10.1371/journal.pone.0317290 article EN cc-by PLoS ONE 2025-01-28

10.1109/icassp49660.2025.10887983 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Since tert-butyl acetate and n-heptane can form binary azeotropes, in this study, two special distillation methods of extractive (ED) improved side-stream were used to separate n-heptane. First, considering the properties molecular bond energy, relative volatility, azeotrope formation, n-methylpyrrolidone (NMP) was selected as most suitable solvent. Subsequently, missing interaction parameters for N-HEP-01/T-BUT-TE NMP/T-BUT-TE obtained by a vapor–liquid equilibrium experiment. Furthermore,...

10.1021/acs.iecr.4c04569 article EN Industrial & Engineering Chemistry Research 2025-03-14

Subspace learning is the process of finding a proper feature subspace and then projecting high-dimensional data onto learned low-dimensional subspace. The projection operation requires many floating-point multiplications additions, which makes computationally expensive. To tackle this problem, paper proposes two <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">simple-but-effective</i> fast image methods, Haar transform (FHT) based principal...

10.1109/tifs.2009.2026455 article EN IEEE Transactions on Information Forensics and Security 2009-07-07

Moving object detection is a key to intelligent video analysis. On the one hand, what moves are not only interesting objects but also noise and cluttered background. other moving without rich texture prone be detected. Therefore, there undesirable false alarms missed in results of many algorithms detection. To reduce alarms, this paper we propose incorporate saliency map into an incremental subspace analysis framework which makes estimated background have less chance than foreground (i.e.,...

10.1109/tcsvt.2016.2630731 article EN IEEE Transactions on Circuits and Systems for Video Technology 2016-11-18

Diabetic retinal image classification aims to conduct diabetic retinopathy automatically diagnosing, which has achieved considerable improvement by deep learning models. However, these methods all rely on sufficient network training large scale annotated data, is very labor-expensive in medical labeling. Aiming overcome drawbacks, this paper focuses embedding self-supervised framework into unsupervised architecture. Specifically, we propose a Self-supervised Fuzzy Clustering Network (SFCN)...

10.1109/access.2020.2994047 article EN cc-by IEEE Access 2020-01-01

This paper proposes multistream CNN, a novel neural network architecture for robust acoustic modeling in speech recognition tasks. The proposed processes input with diverse temporal resolutions by applying different dilation rates to convolutional networks across multiple streams achieve the robustness. are selected from multiples of sub-sampling rate 3 frames. Each stream stacks TDNN-F layers (a variant 1D CNN), and output embedding vectors concatenated then projected final layer. We...

10.1109/icassp39728.2021.9414639 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance its efficiency. Putting together all our observations, we introduce SEW-D (Squeezed Efficient Wav2vec with Disentangled Attention), significant improvements along efficiency dimensions across variety training setups. For example, under 100h-960h semi-supervised...

10.1109/icassp43922.2022.9747432 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Diabetic Retinopathy (DR) causes a significant health threat to the patient's vision with diabetic disease, which may result in blindness severe situations. Various automatic DR diagnosis models have been proposed along development of deep learning, while there always relies on large scale annotated data train network. However, annotating medical fundus images is cost-expensive and requires well-trained professional doctors identity grades. To overcome this drawback, paper focuses utilizing...

10.1109/access.2021.3061690 article EN cc-by IEEE Access 2021-01-01
Coming Soon ...