Meng Yu

ORCID: 0000-0002-0031-9156
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Music and Audio Processing
  • Speech Recognition and Synthesis
  • Advanced Adaptive Filtering Techniques
  • Hearing Loss and Rehabilitation
  • Acoustic Wave Phenomena Research
  • X-ray Diffraction in Crystallography
  • Crystallization and Solubility Studies
  • Advanced Data Compression Techniques
  • Indoor and Outdoor Localization Technologies
  • Optical Wireless Communication Technologies
  • Distributed systems and fault tolerance
  • Data Management and Algorithms
  • Blind Source Separation Techniques
  • Time Series Analysis and Forecasting
  • Natural Language Processing Techniques
  • Ergonomics and Musculoskeletal Disorders
  • Software System Performance and Reliability
  • Mechanical Engineering and Vibrations Research
  • Flame retardant materials and properties
  • Advanced Data Storage Technologies
  • Power Systems and Renewable Energy
  • IoT and Edge/Fog Computing
  • Ultrasonics and Acoustic Wave Propagation
  • Music Technology and Sound Studies

Bellevue Hospital Center
2019-2025

Jiaxing University
2024-2025

Tencent (China)
2018-2024

Hokkaido University
2022-2024

Zhejiang Chinese Medical University
2024

Southern Medical University
2022-2024

State Grid Corporation of China (China)
2018-2024

Aerospace Information Research Institute
2024

Chinese Academy of Sciences
2024

Sun Yat-sen University
2024

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as recognition and enhancement. This paper introduces a new time-domain audio-visual architecture for target speaker extraction from monaural mixtures. The generalizes the previous TasNet (time-domain separation network) enable learning at meanwhile it extends classical frequency-domain time-domain. main components of proposed include an audio encoder, video encoder that extracts lip...

10.1109/asru46091.2019.9003983 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019-12-01

Speech separation algorithms are often used to separate the target speech from other interfering sources. However, purely neural network based systems cause nonlinear distortion that is harmful for automatic recognition (ASR) systems. The conventional mask-based minimum variance distortionless response (MVDR) beamformer can be minimize distortion, but comes with high level of residual noise. Furthermore, matrix operations (e.g., inversion) involved in MVDR solution sometimes numerically...

10.1109/icassp39728.2021.9413594 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

The end-to-end approach for single-channel speech separation has been studied recently and shown promising results. This paper extended the previous proposed a new model multi-channel separation. primary contributions of this work include 1) an integrated waveform-in waveform-out system in single neural network architecture. 2) We reformulate traditional short time Fourier transform (STFT) inter-channel phase difference (IPD) as function time-domain convolution with special kernel. 3)...

10.48550/arxiv.1905.06286 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Speaker-aware source separation methods are promising workarounds for major difficulties such as arbitrary permutation and unknown number of sources.However, it remains challenging to achieve satisfying performance provided a very short available target speaker utterance (anchor).Here we present novel "deep extractor network" which creates an point the in canonical high dimensional embedding space, pulls together time-frequency bins corresponding speaker.The proposed model is different from...

10.21437/interspeech.2018-1205 preprint EN Interspeech 2022 2018-08-28

The crucial role of emotion regulation in learning has been well established, but its potential impact on the English as a foreign language (EFL) process remains uncertain. Examining relationship between strategies and EFL engagement, antecedent variables, significant theoretical practical value. This study aims to explored mediating effects (cognitive reappraisal suppression) associations perceived teacher social support, peer support engagement among Chinese adolescents. data were gathered...

10.1177/13621688241266184 article EN Language Teaching Research 2024-07-27

Background noise, interfering speech and room reverberation frequently distort target in real listening environments. In this study, we address joint separation dereverberation, which aims to separate from background reverberation. order tackle fundamentally difficult problem, propose a novel multimodal network that exploits both audio visual signals. The proposed architecture adopts two-stage strategy, where module is employed attenuate noise the first stage dereverberation suppress second...

10.1109/jstsp.2020.2987209 article EN IEEE Journal of Selected Topics in Signal Processing 2020-03-01

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for directly from waveforms within architecture, time-domain filters spanning signal channels trained perform adaptive filtering. These implemented by 2d convolution...

10.1109/icassp40776.2020.9053092 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Transcatheter arterial chemoembolization (TACE) has proven effective in blocking tumor-supplied arteries and delivering localized chemotherapeutic treatment to combat tumors. However, traditional embolic TACE agents exhibit certain limitations, including insufficient drug-loading sustained-release capabilities, non-biodegradability, susceptibility aggregation, unstable mechanical properties. This study introduces a novel approach address these shortcomings by utilizing complex coacervate as...

10.1002/adhm.202304488 article EN Advanced Healthcare Materials 2024-04-08

Background: Blood-labyrinth barrier (BLB) damage has been recognized as a key mechanism underlying cisplatin (CDDP)-induced hearing loss.Inflammation within the cochlea, triggered by CDDP, is pathological response.However, relationship between CDDP-induced inflammation and BLB dysfunction remains elusive.Materials Methods: In vivo in vitro models were used to explore inflammatory mechanisms CDDP ototoxicity.C57BL/6J mice treated with IL-1β levels, permeability, thresholds assessed using...

10.2147/jir.s492292 article EN cc-by-nc Journal of Inflammation Research 2025-01-01

10.1109/icassp49660.2025.10890048 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

In this paper, we present a joint training framework between the multi-channel beamformer and acoustic model for noise robust automatic speech recognition (ASR). The complex ratio mask (CRM), demonstrated to be more effective than ideal (IRM), is proposed estimate covariance matrix beamformer. Minimum Variance Distortionless Response (MVDR) Generalized Eigenvalue (GEV) are both investigated under CRM-based architecture. We also propose pooling strategy among multiple channels. A long...

10.1109/icassp.2019.8682576 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Although the conventional mask-based minimum variance distortionless response (MVDR) could reduce non-linear distortion, residual noise level of MVDR separated speech is still high.In this paper, we propose a spatio-temporal recurrent neural network based beamformer (RNN-BF) for target separation.This new beamforming framework directly learns weights from estimated and spatial covariance matrices.Leveraging on temporal modeling capability RNNs, RNN-BF automatically accumulate statistics...

10.21437/interspeech.2021-430 article EN Interspeech 2022 2021-08-27

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating responses (RIRs) given acoustic environment. Our FAST-RIR takes rectangular dimensions, listener and speaker positions, reverberation time (T <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">60</inf> ) as inputs generates specular reflections is capable of RIRs input T with an average error 0.02s. evaluate our generated in automatic speech...

10.1109/icassp43922.2022.9747846 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and counting. Our proposed integrates diarization based on end-to-end neural (EEND) models, counting with encoder-decoder attractors (EDA), separation using Conv-TasNet. addition, propose multiple <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$1 \times 1$</tex> convolutional layer architecture for estimating the masks...

10.1109/slt54892.2023.10022924 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear distortions that are harmful for the automatic recognition (ASR).On other hand, minimum variance distortionless response (MVDR) beamformer with NN-predicted masks, significantly reduce distortions, has limited noise reduction capability.In this paper, we propose a multi-tap MVDR complex-valued masks enhancement.Compared to state-of-the-art NN-mask...

10.21437/interspeech.2020-1458 article EN Interspeech 2022 2020-10-25

Deep learning based speech separation approaches have received great interest, among which the recent speaker-aware enhancement methods are promising for solving difficulties such as arbitrary source permutation and unknown number of sources. In this paper, we propose a novel training framework jointly learns speaker-conditioned target speaker extraction model its associated embedding model. The resulting unified directly appropriate improved enhancement. We demonstrate, on our large...

10.1109/icassp40776.2020.9054311 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful modern automatic recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters adopted remove distortions, however, conventional mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved solution is sometimes numerically...

10.1109/taslp.2021.3129335 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2021-01-01

The recent exploration of deep learning for supervised speech separation has significantly accelerated the progress on multi-talker problem. Multi-channel extension attracted much research attention due to benefit spatial information in far-field acoustic environments. In this paper, We review most models multi-channel permutation invariant training (PIT), investigate features formed by microphone pairs and their underlying impact issue, present a multi-band architecture effective feature...

10.1109/icassp.2019.8682470 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-16

In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This integrates neural network (NN) module into closed-loop system during with signals generated recursively on fly closely mimic streaming process of suppression (AHS). The proposed recursive strategy bridges gap between and real-world inference scenarios, marking departure from previous NN-based methods that typically approach...

10.1109/icassp48485.2024.10447839 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18
Coming Soon ...