- Speech and Audio Processing
- Music and Audio Processing
- Speech Recognition and Synthesis
- Advanced Adaptive Filtering Techniques
- Hearing Loss and Rehabilitation
- Acoustic Wave Phenomena Research
- X-ray Diffraction in Crystallography
- Crystallization and Solubility Studies
- Advanced Data Compression Techniques
- Indoor and Outdoor Localization Technologies
- Optical Wireless Communication Technologies
- Distributed systems and fault tolerance
- Data Management and Algorithms
- Blind Source Separation Techniques
- Time Series Analysis and Forecasting
- Natural Language Processing Techniques
- Ergonomics and Musculoskeletal Disorders
- Software System Performance and Reliability
- Mechanical Engineering and Vibrations Research
- Flame retardant materials and properties
- Advanced Data Storage Technologies
- Power Systems and Renewable Energy
- IoT and Edge/Fog Computing
- Ultrasonics and Acoustic Wave Propagation
- Music Technology and Sound Studies
Bellevue Hospital Center
2019-2025
Jiaxing University
2024-2025
Tencent (China)
2018-2024
Hokkaido University
2022-2024
Zhejiang Chinese Medical University
2024
Southern Medical University
2022-2024
State Grid Corporation of China (China)
2018-2024
Aerospace Information Research Institute
2024
Chinese Academy of Sciences
2024
Sun Yat-sen University
2024
Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as recognition and enhancement. This paper introduces a new time-domain audio-visual architecture for target speaker extraction from monaural mixtures. The generalizes the previous TasNet (time-domain separation network) enable learning at meanwhile it extends classical frequency-domain time-domain. main components of proposed include an audio encoder, video encoder that extracts lip...
Speech separation algorithms are often used to separate the target speech from other interfering sources. However, purely neural network based systems cause nonlinear distortion that is harmful for automatic recognition (ASR) systems. The conventional mask-based minimum variance distortionless response (MVDR) beamformer can be minimize distortion, but comes with high level of residual noise. Furthermore, matrix operations (e.g., inversion) involved in MVDR solution sometimes numerically...
The end-to-end approach for single-channel speech separation has been studied recently and shown promising results. This paper extended the previous proposed a new model multi-channel separation. primary contributions of this work include 1) an integrated waveform-in waveform-out system in single neural network architecture. 2) We reformulate traditional short time Fourier transform (STFT) inter-channel phase difference (IPD) as function time-domain convolution with special kernel. 3)...
Speaker-aware source separation methods are promising workarounds for major difficulties such as arbitrary permutation and unknown number of sources.However, it remains challenging to achieve satisfying performance provided a very short available target speaker utterance (anchor).Here we present novel "deep extractor network" which creates an point the in canonical high dimensional embedding space, pulls together time-frequency bins corresponding speaker.The proposed model is different from...
The crucial role of emotion regulation in learning has been well established, but its potential impact on the English as a foreign language (EFL) process remains uncertain. Examining relationship between strategies and EFL engagement, antecedent variables, significant theoretical practical value. This study aims to explored mediating effects (cognitive reappraisal suppression) associations perceived teacher social support, peer support engagement among Chinese adolescents. data were gathered...
Background noise, interfering speech and room reverberation frequently distort target in real listening environments. In this study, we address joint separation dereverberation, which aims to separate from background reverberation. order tackle fundamentally difficult problem, propose a novel multimodal network that exploits both audio visual signals. The proposed architecture adopts two-stage strategy, where module is employed attenuate noise the first stage dereverberation suppress second...
Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for directly from waveforms within architecture, time-domain filters spanning signal channels trained perform adaptive filtering. These implemented by 2d convolution...
Transcatheter arterial chemoembolization (TACE) has proven effective in blocking tumor-supplied arteries and delivering localized chemotherapeutic treatment to combat tumors. However, traditional embolic TACE agents exhibit certain limitations, including insufficient drug-loading sustained-release capabilities, non-biodegradability, susceptibility aggregation, unstable mechanical properties. This study introduces a novel approach address these shortcomings by utilizing complex coacervate as...
Background: Blood-labyrinth barrier (BLB) damage has been recognized as a key mechanism underlying cisplatin (CDDP)-induced hearing loss.Inflammation within the cochlea, triggered by CDDP, is pathological response.However, relationship between CDDP-induced inflammation and BLB dysfunction remains elusive.Materials Methods: In vivo in vitro models were used to explore inflammatory mechanisms CDDP ototoxicity.C57BL/6J mice treated with IL-1β levels, permeability, thresholds assessed using...
In this paper, we present a joint training framework between the multi-channel beamformer and acoustic model for noise robust automatic speech recognition (ASR). The complex ratio mask (CRM), demonstrated to be more effective than ideal (IRM), is proposed estimate covariance matrix beamformer. Minimum Variance Distortionless Response (MVDR) Generalized Eigenvalue (GEV) are both investigated under CRM-based architecture. We also propose pooling strategy among multiple channels. A long...
Although the conventional mask-based minimum variance distortionless response (MVDR) could reduce non-linear distortion, residual noise level of MVDR separated speech is still high.In this paper, we propose a spatio-temporal recurrent neural network based beamformer (RNN-BF) for target separation.This new beamforming framework directly learns weights from estimated and spatial covariance matrices.Leveraging on temporal modeling capability RNNs, RNN-BF automatically accumulate statistics...
We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating responses (RIRs) given acoustic environment. Our FAST-RIR takes rectangular dimensions, listener and speaker positions, reverberation time (T <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">60</inf> ) as inputs generates specular reflections is capable of RIRs input T with an average error 0.02s. evaluate our generated in automatic speech...
In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and counting. Our proposed integrates diarization based on end-to-end neural (EEND) models, counting with encoder-decoder attractors (EDA), separation using Conv-TasNet. addition, propose multiple <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$1 \times 1$</tex> convolutional layer architecture for estimating the masks...
Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear distortions that are harmful for the automatic recognition (ASR).On other hand, minimum variance distortionless response (MVDR) beamformer with NN-predicted masks, significantly reduce distortions, has limited noise reduction capability.In this paper, we propose a multi-tap MVDR complex-valued masks enhancement.Compared to state-of-the-art NN-mask...
Deep learning based speech separation approaches have received great interest, among which the recent speaker-aware enhancement methods are promising for solving difficulties such as arbitrary source permutation and unknown number of sources. In this paper, we propose a novel training framework jointly learns speaker-conditioned target speaker extraction model its associated embedding model. The resulting unified directly appropriate improved enhancement. We demonstrate, on our large...
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful modern automatic recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters adopted remove distortions, however, conventional mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved solution is sometimes numerically...
The recent exploration of deep learning for supervised speech separation has significantly accelerated the progress on multi-talker problem. Multi-channel extension attracted much research attention due to benefit spatial information in far-field acoustic environments. In this paper, We review most models multi-channel permutation invariant training (PIT), investigate features formed by microphone pairs and their underlying impact issue, present a multi-band architecture effective feature...
In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This integrates neural network (NN) module into closed-loop system during with signals generated recursively on fly closely mimic streaming process of suppression (AHS). The proposed recursive strategy bridges gap between and real-world inference scenarios, marking departure from previous NN-based methods that typically approach...