Jiangyu Han

ORCID: 0000-0001-5390-8520
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • EEG and Brain-Computer Interfaces
  • Advanced Adaptive Filtering Techniques
  • Infant Health and Development
  • Landslides and related hazards
  • Emotion and Mood Recognition
  • Distributed and Parallel Computing Systems
  • Hearing Loss and Rehabilitation
  • Indoor and Outdoor Localization Technologies
  • Phonetics and Phonology Research
  • Cryospheric studies and observations
  • Neural Networks and Applications
  • Functional Brain Connectivity Studies
  • Winter Sports Injuries and Performance
  • Telecommunications and Broadcasting Technologies

Brno University of Technology
2022-2025

Chongqing University of Posts and Telecommunications
2023-2024

Shanghai Normal University
2020-2023

Tencent (China)
2021

Shandong University of Science and Technology
2018

10.1109/icassp49660.2025.10889475 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. consists of two separate tasks: 1) Task 1 with single microphone array and focusing practical application real-time requirement 2) 2 multiple distributed micro-phone arrays, which a non-real-time track does not have any constraints so that participants could explore algorithms obtain high quality. Targeting the real conferencing room application,...

10.1109/asru51503.2021.9688126 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in simple yet effective way. This method is inspired by techniques automatic speech recognition. Our model consists two parallel convolutional encoders and transformer-based decoder. By exploiting interactions between input recording initial system's outputs, DiaCorrect can automatically correct speaker activities minimize errors. Experiments on 2-speaker telephony data show...

10.1109/icassp48485.2024.10446968 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies multi-channel are still relatively limited. In this work, we propose two methods exploiting spatial information to extract speech. first one is using a adaptation layer in parallel encoder architecture. second designing channel decorrelation mechanism inter-channel differential enhance representation. We compare proposed with strong state-of-the-art baselines....

10.1109/icassp39728.2021.9414244 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

In recent years, a number of time-domain speech separation methods have been proposed. However, most them are very sensitive to the environments and wide domain coverage tasks. this paper, from time-frequency perspective, we propose densely-connected pyramid complex convolutional network, termed DPCCN, improve robustness under complicated conditions. Furthermore, generalize DPCCN target extraction (TSE) by integrating new specially designed speaker encoder. Moreover, also investigate...

10.1109/icassp43922.2022.9747340 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and speaker to exploit discriminative clues. We propose a special mechanism without introducing any additional parameters scaling adaptation layer better adapt network towards extracting speech. Furthermore, by mixture embedding matrix pooling method, our proposed attention-based (ASA) can clues more efficient way....

10.1109/asru51503.2021.9687903 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

End-to-end neural diarization has evolved considerably over the past few years, but data scarcity is still a major obstacle for further improvements. Self-supervised learning methods such as WavLM have shown promising performance on several downstream tasks, their application speaker somehow limited. In this work, we explore using to alleviate problem of training. We use same pipeline Pyannote and improve local end-to-end with Conformer. Experiments far-field AMI, AISHELL-4, AliMeeting...

10.48550/arxiv.2409.09408 preprint EN arXiv (Cornell University) 2024-09-14

The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. consists of two separate tasks: 1) Task 1 with single microphone array and focusing practical application real-time requirement 2) 2 multiple distributed arrays, which a non-real-time track does not have any constraints so that participants could explore algorithms obtain high quality. Targeting the real conferencing room application, database was...

10.48550/arxiv.2104.00960 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Purpose EEG analysis of emotions is greatly significant for the diagnosis psychological diseases and brain-computer interface (BCI) applications. However, applications brain neural network emotion classification are rarely reported accuracy recognition cross-subject tasks remains a challenge. Thus, this paper proposes to design domain invariant model EEG-network based identification.Methods A novel brain-inception-network deep learning proposed extract discriminative graph features from...

10.1080/27706710.2023.2222159 article EN cc-by-nc Brain-Apparatus Communication A Journal of Bacomics 2023-06-06

Target speech extraction has attracted widespread attention. When microphone arrays are available, the additional spatial information can be helpful in extracting target speech. We have recently proposed a channel decorrelation (CD) mechanism to extract inter-channel differential enhance reference encoder representation. Although shown promising results for from mixtures, performance is still limited by nature of original theory. In this paper, we propose two methods broaden horizon...

10.21437/interspeech.2021-298 article EN Interspeech 2022 2021-08-27

Abstract Recently, supervised speech separation has made great progress. However, limited by the nature of training, most existing methods require ground-truth sources and are trained on synthetic datasets. This reliance is problematic, because signals usually unavailable in real conditions. Moreover, many industry scenarios, acoustic characteristics deviate far from ones simulated Therefore, performance degrades significantly when applying models to applications. To address these problems,...

10.1186/s13636-023-00273-y article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2023-01-20

In recent years, a number of time-domain speech separation methods have been proposed. However, most them are very sensitive to the environments and wide domain coverage tasks. this paper, from time-frequency perspective, we propose densely-connected pyramid complex convolutional network, termed DPCCN, improve robustness under complicated conditions. Furthermore, generalize DPCCN target extraction (TSE) by integrating new specially designed speaker encoder. Moreover, also investigate...

10.48550/arxiv.2112.13520 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a significant challenge, particularly when systems conditioned on speaker embeddings fail to generalize unseen speakers. In this work, we propose Diarization-Conditioned Whisper (DiCoW), novel approach target-speaker ASR that leverages diarization outputs as conditioning information. DiCoW extends the pre-trained model by integrating labels directly, eliminating reliance and reducing need for...

10.48550/arxiv.2501.00114 preprint EN arXiv (Cornell University) 2024-12-30

SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR). However, its inference speed suffers from the quadratic complexity of softmax-attention (SA). In addition, limited by large convolution kernel size, local modeling ability is insufficient. this paper, we propose a novel method HybridFormer to improve fast and efficient way. Specifically, first incorporate linear attention (LA) hybrid LASA paradigm increase model's speed. Second, neural architecture...

10.1109/icassp49357.2023.10096467 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Recently, supervised speech separation has made great progress. However, limited by the nature of training, most existing methods require ground-truth sources and are trained on synthetic datasets. This reliance is problematic, because signals usually unavailable in real conditions. Moreover, many industry scenarios, acoustic characteristics deviate far from ones simulated Therefore, performance degrades significantly when applying models to applications. To address these problems, this...

10.2139/ssrn.4121081 article EN SSRN Electronic Journal 2022-01-01

SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR). However, its inference speed suffers from the quadratic complexity of softmax-attention (SA). In addition, limited by large convolution kernel size, local modeling ability is insufficient. this paper, we propose a novel method HybridFormer to improve fast and efficient way. Specifically, first incorporate linear attention (LA) hybrid LASA paradigm increase model's speed. Second, neural architecture...

10.48550/arxiv.2303.08636 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in simple yet effective way. This method is inspired by techniques automatic speech recognition. Our model consists two parallel convolutional encoders and transform-based decoder. By exploiting interactions between input recording initial system's outputs, DiaCorrect can automatically correct speaker activities minimize errors. Experiments on 2-speaker telephony data show...

10.48550/arxiv.2309.08377 preprint EN other-oa arXiv (Cornell University) 2023-01-01

This paper presents a compartmental Gaussian mixture model (GMM) method to trace layers of the ice sheet with radio-echo sounding (RES) data. Based on compartmentalization RES data, proposed build model, which is solved using Fuzzy C-means (FCM) and expectation maximization (EM) obtain preliminary layer detection results. And boundaries are detected according analyzing classification results GMM. Experimental show that can effectively.

10.1109/icsp.2018.8652476 article EN 2022 16th IEEE International Conference on Signal Processing (ICSP) 2018-08-01
Coming Soon ...