NFDI4DS | UHH-SEMS - Publication Details

Yanhua Long

ORCID: 0000-0003-0924-408X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5056415893

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Phonetics and Phonology Research
Speech and dialogue systems
Advanced Data Compression Techniques
Anomaly Detection Techniques and Applications
Advanced Adaptive Filtering Techniques
Blind Source Separation Techniques
Topic Modeling
Smart Grid and Power Systems
Animal Vocal Communication and Behavior
Infant Health and Development
Direction-of-Arrival Estimation Techniques
Emotion and Mood Recognition
Indoor and Outdoor Localization Technologies
Particle accelerators and beam dynamics
Magnetic confinement fusion research
Energy Load and Power Forecasting
Fault Detection and Control Systems
Sentiment Analysis and Opinion Mining
Network Security and Intrusion Detection
Artificial Intelligence in Healthcare
Human Pose and Action Recognition

Shanghai Normal University
2016-2025

Xi'an Technological University
2024

Northwest Institute of Nuclear Technology
2024

Shandong University of Science and Technology
2024

University of Cambridge
2012-2013

University of Science and Technology of China
2008-2011

Microsoft Research Asia (China)
2011

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

OPENALEX - Publications

Xinyuan Zhou Emre Yılmaz Yanhua Long Yijie Li Haizhou Li

Code-switching (CS) occurs when a speaker alternates words of two or more languages within single sentence across sentences.Automatic speech recognition (ASR) CS has to deal with at the same time.In this study, we propose Transformer-based architecture symmetric language-specific encoders capture individual language attributes, that improve acoustic representation each language.These representations are combined using multi-head attention mechanism in decoder module.Each encoder and its...

10.21437/interspeech.2020-2488 article EN Interspeech 2022 2020-10-25

New Acoustic Features for Synthetic and Replay Spoofing Attack Detection

OPENALEX - Publications

Linqiang Wei Yanhua Long Haoran Wei Yijie Li

With the rapid development of intelligent speech technologies, automatic speaker verification (ASV) has become one most natural and convenient biometric recognition approaches. However, state-of-the-art ASV systems are vulnerable to spoofing attack techniques, such as synthesis, voice conversion, replay speech. Due symmetry distribution characteristic between genuine (true) spoof (fake) pair, detection is challenging. Many recent research works have been focusing on anti-spoofing solutions....

10.3390/sym14020274 article EN Symmetry 2022-01-29

Enhanced cross-modal parallel training for improving end-to-end accented speech recognition

OPENALEX - Publications

Renchang Dong Jie Chen Yanhua Long Yijie Li Dongxing Xu

10.1016/j.specom.2025.103188 article EN Speech Communication 2025-01-11

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

OPENALEX - Publications

Ziling Huang Haixin Guan Haoran Wei Yanhua Long

Personalized speech enhancement (PSE) methods typically rely on pre-trained speaker verification models or self-designed encoders to extract target clues, guiding the PSE model in isolating desired speech. However, these approaches suffer from significant complexity and often underutilize enrollment information, limiting potential performance of model. To address limitations, we propose a novel Speaker Encoder-Free network, termed SEF-PNet, which fully exploits information present both noisy...

10.48550/arxiv.2501.11274 preprint EN arXiv (Cornell University) 2025-01-20

AI-Powered Research on Translation Strategies and Methods and Its Advantages and Disadvantages: A Case Study of the Translation of Ambulance Lectures: First Aid to the Injured

OPENALEX - Publications

Yanhua Long Xiaoling Zhou

This article, empowered by ChatGPT and through retrieving relevant historical literature, explores how the translator Sun Yat-sen flexibly employed strategies like domestication foreignization, as well methods omission, addition, modification in his translation of Ambulance Lectures: First Aid to Injured. Additionally, research highlights use a tool assist study. While is able provide comprehensive knowledge quickly proper translations, improvements are still needed terms image accuracy...

10.26689/jcer.v9i1.9317 article EN Journal of Contemporary Educational Research 2025-02-14

Personalized Speech Enhancement without User Enrollment for Real-World Audio Replay Scenarios

OPENALEX - Publications

Haoran Wei Shilin Wang Yanhua Long

10.1109/icassp49660.2025.10890546 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

OPENALEX - Publications

Ziling Huang Haixin Guan Haoran Wei Yanhua Long

10.1109/icassp49660.2025.10887858 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Leveraging Out-of-Domain Noise for Unsupervised Domain Adaptation in Speech Enhancement

OPENALEX - Publications

Yu Hsien Liao Haixin Guan Shuang Wei Yanhua Long

10.1109/icassp49660.2025.10888360 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

MLAL: Multiple prompt learning and generation of auxiliary labeled utterances for emotion recognition in conversations

OPENALEX - Publications

Zhinan Gou Yuxin Chen Yanhua Long Mengyao Jia Zhili Liu and 1 more

10.1016/j.mlwa.2025.100643 article EN cc-by-nc-nd Machine Learning with Applications 2025-03-01

Unsupervised End-to-End Accented Speech Recognition Under Low-resource Conditions

OPENALEX - Publications

Li Li Yijie Li Dongxing Xu Yanhua Long

10.1109/taslpro.2025.3557248 article EN IEEE Transactions on Audio Speech and Language Processing 2025-01-01

Acoustic data augmentation for Mandarin-English code-switching speech recognition

OPENALEX - Publications

Yanhua Long Yijie Li Qiaozheng Zhang Shuang Wei Hong Ye and 1 more

10.1016/j.apacoust.2019.107175 article EN Applied Acoustics 2019-12-23

Transcription of multi-genre media archives using out-of-domain data

OPENALEX - Publications

Peter Bell Mark Gales Pierre Lanchantin Xiaobing Liu Yanhua Long and 3 more

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this challenging task, which may benefit from systems trained combination in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), novel technique incorporating information posterior features using deep neural networks. show that it provides substantial reduction in WER over other systems, relative reductions...

10.1109/slt.2012.6424244 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2012-12-01

PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement

OPENALEX - Publications

Xiaofeng Ge Jiangyu Han Yanhua Long Haixin Guan

10.21437/interspeech.2022-43 article EN Interspeech 2022 2022-09-16

Tri-stage training with language-specific encoder and bilingual acoustic learner for code-switching speech recognition

OPENALEX - Publications

Xuefei Wang Yuan Jin Fenglong Xie Yanhua Long

10.1016/j.apacoust.2024.109883 article EN Applied Acoustics 2024-01-25

Improving lightly supervised training for broadcast transcription

OPENALEX - Publications

Yanhua Long Mark Gales Pierre Lanchantin Xuefeng Liu M.S. Seigel and 1 more

This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data.Standard uses automatically derived decoding hypotheses using a biased language model.However, as the actual speech can deviate significantly from original programme scripts that are supplied, quality standard be poor.To address this issue, word and segment level combination approaches used between transcripts which yield improved transcriptions.Experimental results show systems...

10.21437/interspeech.2013-516 article EN Interspeech 2022 2013-08-25

Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR

OPENALEX - Publications

Yanhua Long Yijie Li Shuang Wei Qiaozheng Zhang Chunxia Yang

This study investigated large-scale semi-supervised training (SST) to improve acoustic models for automatic speech recognition. The conventional self-training, the recently proposed committee-based SST using heterogeneous neural networks and lattice-based were examined compared. was studied in deep network modeling with respect transcription quality, importance data filtering, quantity other attributes of a large multi-genre unsupervised live data. We found that behavior on ASR tasks very...

10.1109/access.2019.2940961 article EN cc-by IEEE Access 2019-01-01

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction

OPENALEX - Publications

Jiangyu Han Yanhua Long Lukáš Burget Jaň Černocký

In recent years, a number of time-domain speech separation methods have been proposed. However, most them are very sensitive to the environments and wide domain coverage tasks. this paper, from time-frequency perspective, we propose densely-connected pyramid complex convolutional network, termed DPCCN, improve robustness under complicated conditions. Furthermore, generalize DPCCN target extraction (TSE) by integrating new specially designed speaker encoder. Moreover, also investigate...

10.1109/icassp43922.2022.9747340 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

Multi-Channel Target Speech Extraction with Channel Decorrelation and Target Speaker Adaptation

OPENALEX - Publications

Jiangyu Han Xinyuan Zhou Yanhua Long Yijie Li

The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies multi-channel are still relatively limited. In this work, we propose two methods exploiting spatial information to extract speech. first one is using a adaptation layer in parallel encoder architecture. second designing channel decorrelation mechanism inter-channel differential enhance representation. We compare proposed with strong state-of-the-art baselines....

10.1109/icassp39728.2021.9414244 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

OPENALEX - Publications

Yunhao Liang Yanhua Long Yijie Li Jiaen Liang Yu-Ping Wang

10.1016/j.dsp.2022.103446 article EN Digital Signal Processing 2022-01-30

FEW-Shot Continual Learning with Weight Alignment and Positive Enhancement for Bioacoustic Event Detection

OPENALEX - Publications

Xiaoxiao Wu Dongxing Xu Haoran Wei Yanhua Long

In this paper, we propose a new continual learning framework for few-shot bioacoustic event detection (BED). First, modify the recently proposed dynamic (DFSL) and generalize it to BED task. Then, introduce weight alignment loss enhance generator of modified DFSL detecting novel events. Furthermore, augment few positive samples each target event, enhancement approach is select high-confidence pseudo positives using cumulative distribution initial posterior probabilities. All experiments are...

10.1109/icassp49357.2023.10096307 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Cross-Modal Parallel Training for Improving end-to-end Accented Speech Recognition

OPENALEX - Publications

Renchang Dong Yijie Li Dongxing Xu Yanhua Long

Multi-accent speech recognition is a key challenge in current due to the pronunciation variations of different accents. In this study, we propose Cross-modal Parallel Training (CPT) approach for improving accent robustness state-of-the-art Conformer-Transducer (Conformer-T) ASR system. Specifically, CPT, novel cross-modal attention and fusion module first designed as frontend align low-level acoustic representations with phonetic embeddings, thus normalizing into shared standard latent...

10.1109/icassp48485.2024.10447979 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Keyword Spotting Based On CTC and RNN For Mandarin Chinese Speech

OPENALEX - Publications

Yiyan Wang Yanhua Long

The state-of-the-art acoustic modeling for Keyword Spotting (KWS) systems is mainly based on the hybrid model of Hidden Markov Model (HMM) and Neural Network (NN). However, it challenging to efficiently train such a system, since dependence intermediate phonetic representation. Motivated by end-to-end speech recognition systems, we propose Mandarin KWS system using method, which directly predict posterior units. Connectionist Temporal Classifier (CTC) Recurrent (RNN). main difference between...

10.1109/iscslp.2018.8706631 article EN 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018-11-01

Domain adaptation of lattice-free MMI based TDNN models for speech recognition

OPENALEX - Publications

Yanhua Long Yijie Li Hone Ye Hongwei Mao

10.1007/s10772-017-9399-z article EN International Journal of Speech Technology 2017-02-01

Attention-Based Scaling Adaptation for Target Speech Extraction

OPENALEX - Publications

Jiangyu Han Wei Rao Yanhua Long Jiaen Liang

The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and speaker to exploit discriminative clues. We propose a special mechanism without introducing any additional parameters scaling adaptation layer better adapt network towards extracting speech. Furthermore, by mixture embedding matrix pooling method, our proposed attention-based (ASA) can clues more efficient way....

10.1109/asru51503.2021.9687903 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021-12-13

Coming Soon ...