Yanhua Long

ORCID: 0000-0003-0924-408X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Phonetics and Phonology Research
  • Speech and dialogue systems
  • Advanced Data Compression Techniques
  • Anomaly Detection Techniques and Applications
  • Advanced Adaptive Filtering Techniques
  • Blind Source Separation Techniques
  • Topic Modeling
  • Smart Grid and Power Systems
  • Animal Vocal Communication and Behavior
  • Infant Health and Development
  • Direction-of-Arrival Estimation Techniques
  • Emotion and Mood Recognition
  • Indoor and Outdoor Localization Technologies
  • Particle accelerators and beam dynamics
  • Magnetic confinement fusion research
  • Energy Load and Power Forecasting
  • Fault Detection and Control Systems
  • Sentiment Analysis and Opinion Mining
  • Network Security and Intrusion Detection
  • Artificial Intelligence in Healthcare
  • Human Pose and Action Recognition

Shanghai Normal University
2016-2025

Xi'an Technological University
2024

Northwest Institute of Nuclear Technology
2024

Shandong University of Science and Technology
2024

University of Cambridge
2012-2013

University of Science and Technology of China
2008-2011

Microsoft Research Asia (China)
2011

Code-switching (CS) occurs when a speaker alternates words of two or more languages within single sentence across sentences.Automatic speech recognition (ASR) CS has to deal with at the same time.In this study, we propose Transformer-based architecture symmetric language-specific encoders capture individual language attributes, that improve acoustic representation each language.These representations are combined using multi-head attention mechanism in decoder module.Each encoder and its...

10.21437/interspeech.2020-2488 article EN Interspeech 2022 2020-10-25

With the rapid development of intelligent speech technologies, automatic speaker verification (ASV) has become one most natural and convenient biometric recognition approaches. However, state-of-the-art ASV systems are vulnerable to spoofing attack techniques, such as synthesis, voice conversion, replay speech. Due symmetry distribution characteristic between genuine (true) spoof (fake) pair, detection is challenging. Many recent research works have been focusing on anti-spoofing solutions....

10.3390/sym14020274 article EN Symmetry 2022-01-29

Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed encoders to extract target clues, guiding the PSE model in isolating desired speech. However, these approaches suffer from significant complexity and often underutilize enrollment information, limiting potential performance of model. To address limitations, we propose a novel Speaker Encoder-Free network, termed SEF-PNet, which fully exploits information present both noisy...

10.48550/arxiv.2501.11274 preprint EN arXiv (Cornell University) 2025-01-20

This article, empowered by ChatGPT and through retrieving relevant historical literature, explores how the translator Sun Yat-sen flexibly employed strategies like domestication foreignization, as well methods omission, addition, modification in his translation of Ambulance Lectures: First Aid to Injured. Additionally, research highlights use a tool assist study. While is able provide comprehensive knowledge quickly proper translations, improvements are still needed terms image accuracy...

10.26689/jcer.v9i1.9317 article EN Journal of Contemporary Educational Research 2025-02-14

10.1109/icassp49660.2025.10890546 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10887858 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10888360 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/taslpro.2025.3557248 article EN IEEE Transactions on Audio Speech and Language Processing 2025-01-01

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this challenging task, which may benefit from systems trained combination in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), novel technique incorporating information posterior features using deep neural networks. show that it provides substantial reduction in WER over other systems, relative reductions...

10.1109/slt.2012.6424244 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2012-12-01

This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data.Standard uses automatically derived decoding hypotheses using a biased language model.However, as the actual speech can deviate significantly from original programme scripts that are supplied, quality standard be poor.To address this issue, word and segment level combination approaches used between transcripts which yield improved transcriptions.Experimental results show systems...

10.21437/interspeech.2013-516 article EN Interspeech 2022 2013-08-25

This study investigated large-scale semi-supervised training (SST) to improve acoustic models for automatic speech recognition. The conventional self-training, the recently proposed committee-based SST using heterogeneous neural networks and lattice-based were examined compared. was studied in deep network modeling with respect transcription quality, importance data filtering, quantity other attributes of a large multi-genre unsupervised live data. We found that behavior on ASR tasks very...

10.1109/access.2019.2940961 article EN cc-by IEEE Access 2019-01-01

In recent years, a number of time-domain speech separation methods have been proposed. However, most them are very sensitive to the environments and wide domain coverage tasks. this paper, from time-frequency perspective, we propose densely-connected pyramid complex convolutional network, termed DPCCN, improve robustness under complicated conditions. Furthermore, generalize DPCCN target extraction (TSE) by integrating new specially designed speaker encoder. Moreover, also investigate...

10.1109/icassp43922.2022.9747340 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies multi-channel are still relatively limited. In this work, we propose two methods exploiting spatial information to extract speech. first one is using a adaptation layer in parallel encoder architecture. second designing channel decorrelation mechanism inter-channel differential enhance representation. We compare proposed with strong state-of-the-art baselines....

10.1109/icassp39728.2021.9414244 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

In this paper, we propose a new continual learning framework for few-shot bioacoustic event detection (BED). First, modify the recently proposed dynamic (DFSL) and generalize it to BED task. Then, introduce weight alignment loss enhance generator of modified DFSL detecting novel events. Furthermore, augment few positive samples each target event, enhancement approach is select high-confidence pseudo positives using cumulative distribution initial posterior probabilities. All experiments are...

10.1109/icassp49357.2023.10096307 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Multi-accent speech recognition is a key challenge in current due to the pronunciation variations of different accents. In this study, we propose Cross-modal Parallel Training (CPT) approach for improving accent robustness state-of-the-art Conformer-Transducer (Conformer-T) ASR system. Specifically, CPT, novel cross-modal attention and fusion module first designed as frontend align low-level acoustic representations with phonetic embeddings, thus normalizing into shared standard latent...

10.1109/icassp48485.2024.10447979 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

The state-of-the-art acoustic modeling for Keyword Spotting (KWS) systems is mainly based on the hybrid model of Hidden Markov Model (HMM) and Neural Network (NN). However, it challenging to efficiently train such a system, since dependence intermediate phonetic representation. Motivated by end-to-end speech recognition systems, we propose Mandarin KWS system using method, which directly predict posterior units. Connectionist Temporal Classifier (CTC) Recurrent (RNN). main difference between...

10.1109/iscslp.2018.8706631 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018-11-01

10.1007/s10772-017-9399-z article EN International Journal of Speech Technology 2017-02-01

The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and speaker to exploit discriminative clues. We propose a special mechanism without introducing any additional parameters scaling adaptation layer better adapt network towards extracting speech. Furthermore, by mixture embedding matrix pooling method, our proposed attention-based (ASA) can clues more efficient way....

10.1109/asru51503.2021.9687903 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13
Coming Soon ...