NFDI4DS | UHH-SEMS - Publication Details

Yifan Peng

ORCID: 0000-0002-8581-8674

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5033862822

Research Areas

Speech Recognition and Synthesis
Natural Language Processing Techniques
Topic Modeling
Speech and dialogue systems
Speech and Audio Processing
Music and Audio Processing
Biomedical Text Mining and Ontologies
Tensor decomposition and applications
Radiomics and Machine Learning in Medical Imaging
Hydrocarbon exploration and reservoir analysis
Hydraulic Fracturing and Reservoir Analysis
Model Reduction and Neural Networks
AI in cancer detection
Soil Mechanics and Vehicle Dynamics
Neural Networks and Applications
Gaussian Processes and Bayesian Inference
Artificial Intelligence in Healthcare and Education
COVID-19 diagnosis using AI
Machine Learning in Healthcare
Vehicle Dynamics and Control Systems
Geographic Information Systems Studies
Numerical methods for differential equations
Intelligent Tutoring Systems and Adaptive Learning
Control Systems in Engineering
Data Management and Algorithms

Carnegie Mellon University
2022-2025

Chongqing University
2025

RS Dynamics (Czechia)
2025

State Key Laboratory of Coal Mine Disaster Dynamics and Control
2025

Cornell University
2022-2023

Weill Cornell Medicine
2022-2023

Hunan University of Science and Technology
2023

University of Pittsburgh
2023

The University of Texas at Austin
2023

Shanghai Jiao Tong University
2022

E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition

OPENALEX - Publications

Kwangyoun Kim Felix Wu Yifan Peng Jing Pan Prashant Sridhar and 2 more

Conformer, combining convolution and self-attention sequentially to capture both local global information, has shown remarkable performance is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating but they not managed match Conformer's performance. The recently introduced Branchformer achieves comparable Conformer by using dedicated branches of merging context from each branch. In this paper, we propose...

10.1109/slt54892.2023.10022656 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet

OPENALEX - Publications

Siddhant Arora Siddharth Dalmia Pavel Denisov Xuankai Chang Yushi Ueda and 8 more

As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language (NLP) tasks. However, few open source toolkits that can be used generate reproducible results on different Spoken Understanding (SLU) benchmarks. Hence, a need build standard have faster start into SLU research. We present ESPnet-SLU, which designed for quick development spoken language understanding in single framework. ESPnet-SLU project...

10.1109/icassp43922.2022.9747674 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

OPENALEX - Publications

Yifan Peng Jinchuan Tian William Chen Siddhant Arora Brian Yan and 7 more

Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions OWSM (v1 to v3) are still based on standard Transformer, which might lead inferior performance compared state-of-the-art speech encoder architectures. This work aims improve efficiency without additional data. We present a series...

10.21437/interspeech.2024-1194 article EN Interspeech 2022 2024-09-01

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

OPENALEX - Publications

Yifan Peng Siddharth Dalmia Ian Lane Shinji Watanabe

Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global self-attention. Inspired by this, we propose a more flexible, interpretable customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged end-to-end processing. In each layer, one branch employs self-attention or its variant capture long-range dependencies, while other utilizes an MLP module...

10.48550/arxiv.2207.02971 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Improving Massively Multilingual ASR with Auxiliary CTC Objectives

OPENALEX - Publications

William Chen Brian Yan Jiatong Shi Yifan Peng Soumi Maiti and 1 more

Multilingual Automatic Speech Recognition (ASR) models have extended the usability of speech technologies to a wide variety languages. With how many languages these handle, however, key understanding their imbalanced performance across different is examine if model actually knows which language it should transcribe. In this paper, we introduce our work on improving FLEURS, 102-language open ASR benchmark, by conditioning entire identity (LID). We investigate techniques inspired from recent...

10.1109/icassp49357.2023.10095326 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding

OPENALEX - Publications

Yifan Peng Kwangyoun Kim Felix Wu Prashant Sridhar Shinji Watanabe

Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim reduce the model size computation without degradation accuracy. Prior studies focus on of Transformers; however, not only utilize a stack Transformer blocks, also combine frontend network based multiple convolutional layers for low-level feature learning. This small heavy computational cost. In...

10.1109/icassp49357.2023.10095780 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

OPENALEX - Publications

William Chen Xuankai Chang Yifan Peng Zhaoheng Ni Soumi Maiti and 1 more

10.21437/interspeech.2023-1176 article EN Interspeech 2022 2023-08-14

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

OPENALEX - Publications

Yifan Peng Yui Sudo Muhammad Shakeel Shinji Watanabe

10.21437/interspeech.2023-1213 article EN Interspeech 2022 2023-08-14

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech

OPENALEX - Publications

Chien‐Yu Huang Ke-Han Lu Shih-Heng Wang Chi-Yuan Hsiao Chun-Yi Kuan and 10 more

Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, benchmark designed for building universal capable leveraging instruction tuning perform multiple fashion. To achieve...

10.1109/icassp48485.2024.10448257 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks

OPENALEX - Publications

Soumi Maiti Yifan Peng Shukjae Choi Jee-weon Jung Xuankai Chang and 1 more

We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, synthesis, text generation, and continuation. VoxtLM integrates vocabulary with discrete tokens from self-supervised features uses special to enable multitask learning. Compared single-task exhibits significant improvement in improvements both intelligibility 28.9 5.6 objective quality 2.68 3.90. also improves generation recognition performance over the counterpart. Further, is trained publicly...

10.1109/icassp48485.2024.10447112 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

OPENALEX - Publications

Yifan Peng Yui Sudo Muhammad Shakeel Shinji Watanabe

10.18653/v1/2024.acl-long.549 article EN 2024-01-01

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders

OPENALEX - Publications

Yui Sudo Muhammad Shakeel Y. Fukumoto Brian Yan Jiatong Shi and 2 more

10.1109/taslpro.2025.3530324 article EN IEEE Transactions on Audio Speech and Language Processing 2025-01-01

The Impact of Scco2 Exposure on the Imbibition of Shale and the Potential for Water Blockage Removal

OPENALEX - Publications

Nianjie Kuang Junping Zhou Nianjie Kuang Xuefu Xian Jinyuan Zhang and 5 more

10.2139/ssrn.5102701 preprint EN 2025-01-01

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

OPENALEX - Publications

William Chen Jinchuan Tian Yifan Peng Brian Yan Chao-Han Huck Yang and 1 more

Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these have been extensively characterized in other modalities, their behavior speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual recognition and translation models spanning 0.25B to 18B parameters, with the version being largest model, best our knowledge. OWLS leverages up 360K hours public data across 150...

10.48550/arxiv.2502.10373 preprint EN arXiv (Cornell University) 2025-02-14

Shale Gas Production and CO2 Storage of CO2-ESGR Based on the Stress–Strain–Sorption Behavior of Shale

OPENALEX - Publications

Hongzhang Wang Junping Zhou Xuefu Xian Shifeng Tian Zhiqiang Dong and 5 more

10.1021/acs.energyfuels.5c00489 article EN Energy & Fuels 2025-03-08

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data

OPENALEX - Publications

Yifan Peng Jinchuan Tian Brian Yan Dan Berrebbi Xuankai Chang and 11 more

Pre-training speech models on large volumes of data has achieved remarkable success. OpenAI Whisper is a multilingual multitask model trained 680k hours supervised data. It generalizes well to various recognition and translation benchmarks even in zero-shot setup. However, the full pipeline for developing such (from collection training) not publicly accessible, which makes it difficult researchers further improve its performance address training-related issues as efficiency, robustness,...

10.1109/asru57964.2023.10389676 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16

A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

OPENALEX - Publications

Yifan Peng Siddhant Arora Yosuke Higuchi Yushi Ueda Sujay V. Kumar and 4 more

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming. Recent studies achieved promising results by using pre-trained models in low-resource scenarios. Inspired this, we aim to ask: which (if any) pre-training strategies can improve performance across SLU benchmarks? To answer this question, employ four types of their combinations SLU. We leverage self-supervised speech (LM) on large quantities un-paired extract strong text...

10.1109/slt54892.2023.10022399 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2023-01-09

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

OPENALEX - Publications

Brian Yan Jiatong Shi Yun Tang Hirofumi Inaguma Yifan Peng and 11 more

Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 3: System Demonstrations). 2023.

10.18653/v1/2023.acl-demo.38 article EN cc-by 2023-01-01

Representing and utilizing clinical textual data for real world studies: An OHDSI approach

OPENALEX - Publications

Vipina K. Keloth Juan M. Banda Michael Gurley Paul M. Heider Georgina Kennedy and 22 more

10.1016/j.jbi.2023.104343 article EN publisher-specific-oa Journal of Biomedical Informatics 2023-03-17

Speechlmscore: Evaluating Speech Generation Using Speech Language Model

OPENALEX - Publications

Soumi Maiti Yifan Peng Takaaki Saeki Shinji Watanabe

While human evaluation is the most reliable metric for evaluating speech generation systems, it generally costly and time-consuming. Previous studies on automatic quality assessment address problem by predicting scores with machine learning models. However, they rely supervised thus suffer from high annotation costs domain-shift problems. We propose SpeechLMScore, an unsupervised to evaluate generated using a language model. SpeechLMScore computes average log-probability of signal mapping...

10.1109/icassp49357.2023.10095710 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

OPENALEX - Publications

Yifan Peng Kwangyoun Kim Felix Wu Brian Yan Siddhant Arora and 5 more

10.21437/interspeech.2023-1194 article EN Interspeech 2022 2023-08-14

FactReranker: Fact-guided Reranker for Faithful Radiology Report Summarization

OPENALEX - Publications

Qianqian Xie Jiayu Zhou Yifan Peng Fei Wang

Automatic radiology report summarization is a crucial clinical task, whose key challenge to maintain factual accuracy between produced summaries and ground truth findings. Existing research adopts reinforcement learning directly optimize consistency metrics such as CheXBert or RadGraph score. However, their decoding method using greedy search beam considers no when picking the optimal candidate, leading limited improvement. To address it, we propose novel second-stage summarizing approach...

10.48550/arxiv.2303.08335 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Anomaly Detection of Calcifications in Mammography Based on 11,000 Negative Cases

OPENALEX - Publications

Rui Hou Yifan Peng Lars J. Grimm Yinhao Ren Maciej A. Mazurowski and 5 more

In mammography, calcifications are one of the most common signs breast cancer. Detection such lesions is an active area research for computer-aided diagnosis and machine learning algorithms. Due to limited numbers positive cases, many supervised detection models suffer from overfitting fail generalize. We present a one-class, semi-supervised framework using deep convolutional autoencoder trained with over 50,000 images 11,000 negative-only cases. Since model learned only normal parenchymal...

10.1109/tbme.2021.3126281 article EN IEEE Transactions on Biomedical Engineering 2021-11-17

CMU’s IWSLT 2022 Dialect Speech Translation System

OPENALEX - Publications

Brian Yan Patrick Fernandes Siddharth Dalmia Jiatong Shi Yifan Peng and 4 more

Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe. Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022). 2022.

10.18653/v1/2022.iwslt-1.27 article EN cc-by 2022-01-01

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search

OPENALEX - Publications

Yui Sudo Muhammad Shakeel Y. Fukumoto Yifan Peng Shinji Watanabe

End-to-end (E2E) automatic speech recognition (ASR) methods exhibit remarkable performance. However, since the performance of such is intrinsically linked to context present in training data, E2E-ASR do not perform as desired for unseen user contexts (e.g., technical terms, personal names, and playlists). Thus, must be easily contextualized by or developer. This paper proposes an attention-based contextual biasing method that can customized using editable phrase list (referred a bias list)....

10.1109/icassp48485.2024.10447782 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Coming Soon ...