NFDI4DS | UHH-SEMS - Publication Details

Chao-Han Huck Yang

ORCID: 0000-0003-2879-8811

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5020376803

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Quantum Computing Algorithms and Architecture
Adversarial Robustness in Machine Learning
Topic Modeling
Multimodal Machine Learning Applications
Anomaly Detection Techniques and Applications
Video Analysis and Summarization
Privacy-Preserving Technologies in Data
Reinforcement Learning in Robotics
Neural Networks and Applications
Advanced Image and Video Retrieval Techniques
Advancements in Semiconductor Devices and Circuit Design
Speech and dialogue systems
Domain Adaptation and Few-Shot Learning
Quantum and electron transport phenomena
Quantum Information and Cryptography
Image Enhancement Techniques
Tensor decomposition and applications
Human Pose and Action Recognition
Advanced Image Processing Techniques
Geophysical Methods and Applications
Stochastic Gradient Optimization Techniques

Nvidia (United States)
2024-2025

Nvidia (United Kingdom)
2023-2025

Georgia Institute of Technology
2019-2024

Amazon (United States)
2021-2024

Google (United States)
2023

National Yang Ming Chiao Tung University
2023

King Abdullah University of Science and Technology
2021-2023

Universidad San Pablo CEU
2022

China University of Mining and Technology
2020

Variational Quantum Circuits for Deep Reinforcement Learning

OPENALEX - Publications

Samuel Yen-Chi Chen Chao-Han Huck Yang Jun Qi Pin‐Yu Chen Xiaoli Ma and 1 more

The state-of-the-art machine learning approaches are based on classical von Neumann computing architectures and have been widely used in many industrial academic domains. With the recent development of quantum computing, researchers tech-giants attempted new circuits for tasks. However, existing platforms hard to simulate deep models or problems because intractability circuits. Thus, it is necessary design feasible algorithms noisy intermediate scale (NISQ) devices. This work explores...

10.1109/access.2020.3010470 article EN cc-by IEEE Access 2020-01-01

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

OPENALEX - Publications

Chao-Han Huck Yang Jun Qi Samuel Yen-Chi Chen Pin‐Yu Chen Sabato Marco Siniscalchi and 2 more

We propose a novel decentralized feature extraction approach in federated learning to address privacy-preservation issues for speech recognition. It is built upon quantum convolutional neural network (QCNN) composed of circuit encoder extraction, and recurrent (RNN) based end-to-end acoustic model (AM). To enhance parameter protection architecture, an input first up-streamed computing server extract Mel-spectrogram, the corresponding features are encoded using algorithm with random...

10.1109/icassp39728.2021.9413453 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Y-Net: Multi-Scale Feature Aggregation Network With Wavelet Structure Similarity Loss Function For Single Image Dehazing

OPENALEX - Publications

Hao-Hsiang Yang Chao-Han Huck Yang Yichang Tsai

Single image dehazing is the ill-posed two-dimensional signal reconstruction problem. Recently, deep convolutional neural networks (CNN) have been successfully used in many computer vision problems. In this paper, we propose a Y-net that named for its structure. This network reconstructs clear images by aggregating multi-scale features maps. Additionally, Wavelet Structure SIMilarity (W-SSIM) loss function training step. proposed function, discrete wavelet transforms are applied repeatedly...

10.1109/icassp40776.2020.9053920 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Theoretical error performance analysis for variational quantum circuit based functional regression

OPENALEX - Publications

Jun Qi Chao-Han Huck Yang Pin‐Yu Chen Min-Hsiu Hsieh

Abstract The noisy intermediate-scale quantum devices enable the implementation of variational circuit (VQC) for neural networks (QNN). Although VQC-based QNN has succeeded in many machine learning tasks, representation and generalization powers VQC still require further investigation, particularly when dimensionality classical inputs is concerned. In this work, we first put forth an end-to-end QNN, TTN-VQC, which consists a tensor network based on tensor-train (TTN) reduction functional...

10.1038/s41534-022-00672-7 article EN cc-by npj Quantum Information 2023-01-09

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

OPENALEX - Publications

Chao-Han Huck Yang Jun Qi Pin‐Yu Chen Xiaoli Ma Chin‐Hui Lee

Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems. In this work, we present a U-Net attention model, U-Net$_{At}$, enhance signals. Specifically, evaluate model performance by interpretable metrics and discuss augmented training. Our experiments show that our proposed U-Net$_{At}$ improves perceptual evaluation of quality (PESQ) from 1.13 2.78, transmission index (STI) 0.65 0.75, short-term objective...

10.1109/icassp40776.2020.9053288 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation

OPENALEX - Publications

Jia-Hong Huang Chao-Han Huck Yang Fangyu Liu Tian Meng Yi-Chieh Liu and 7 more

In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency accuracy. The proposed is composed of a deep neural networks-based (DNN-based) module, including identifier clinical description generator, DNN visual explanation module. To train validate effectiveness our DNN-based large-scale image dataset. Also, as ground truth, provide dataset manually labeled by qualitatively...

10.1109/wacv48630.2021.00249 article EN 2021-01-01

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing

OPENALEX - Publications

Chao-Han Huck Yang Jun Qi Samuel Yen-Chi Chen Yu Tsao Pin‐Yu Chen

The rapid development of quantum computing has demonstrated many unique characteristics advantages, such as richer feature representation and more secured protection on model parameters. This work proposes a vertical federated learning architecture based variational circuits to demonstrate the competitive performance quantum-enhanced pre-trained BERT for text classification. In particular, our proposed hybrid classical-quantum consists novel random temporal convolution (QTC) framework...

10.1109/icassp43922.2022.9746412 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

OPENALEX - Publications

Chao-Han Huck Yang Bo Li Yu Zhang Nanxin Chen Rohit Prabhavalkar and 2 more

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic recognition (ASR) models to recognize the other languages. We design different auxiliary architectures focusing learnable pre-trained feature enhancement that, first time, empowers ASR. Specifically, investigate how select trainable components (i.e., encoder) of conformer-based RNN-Transducer, as...

10.1109/icassp49357.2023.10094903 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming

OPENALEX - Publications

Yun-Ning Hung Chao-Han Huck Yang Pin‐Yu Chen Alexander Lerch

Transfer learning (TL) approaches have shown promising results when handling tasks with limited training data. However, considerable memory and computational resources are often required for fine-tuning pre-trained neural networks target domain In this work, we introduce a novel method leveraging speech models low-resource music classification based on the concept of Neural Model Reprogramming (NMR). NMR aims at re-purposing model from source to by modifying input frozen cross-modal...

10.1109/icassp49357.2023.10096568 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

OPENALEX - Publications

Hu Hu Chao-Han Huck Yang Xianjun Xia Xue Bai Xin Tang and 11 more

In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. comprises two different sub-tasks: (i) 1a focuses on ASC audio signals recorded with multiple (real simulated) devices into ten fine-grained classes, (ii) 1b concerns classification data three higher-level classes using low-complexity solutions. For 1a, propose novel two-stage system leveraging upon ad-hoc...

10.48550/arxiv.2007.08389 preprint EN cc-by arXiv (Cornell University) 2020-01-01

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

OPENALEX - Publications

Hu Hu Chao-Han Huck Yang Xianjun Xia Xue Bai Xin Tang and 11 more

To improve device robustness, a highly desirable key feature of competitive data-driven acoustic scene classification (ASC) system, novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our leverages an ad-hoc score combination two CNN classifiers: (i) the first classifies inputs into one three broad classes, and (ii) second same ten finergrained classes. Three different architectures are explored to implement classifiers, frequency sub-sampling scheme...

10.1109/icassp39728.2021.9414835 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

OPENALEX - Publications

Yuchen Hu Chen Chen Chao-Han Huck Yang Ruizhe Li Dong Zhang and 2 more

10.18653/v1/2024.acl-long.5 article EN 2024-01-01

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

OPENALEX - Publications

Sung-Feng Huang Heng-Cheng Kuo Zhehuai Chen Xuesong Yang Chao-Han Huck Yang and 4 more

Neural speech editing advancements have raised concerns about their misuse in spoofing attacks. Traditional partially edited corpora primarily focus on cut-and-paste edits, which, while maintaining speaker consistency, often introduce detectable discontinuities. Recent methods, like A\textsuperscript{3}T and Voicebox, improve transitions by leveraging contextual information. To foster detection research, we the Speech INfilling Edit (SINE) dataset, created with Voicebox. We detailed process...

10.48550/arxiv.2501.03805 preprint EN arXiv (Cornell University) 2025-01-07

Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer

OPENALEX - Publications

Hu Hu Sabato Marco Siniscalchi Chao-Han Huck Yang Chin‐Hui Lee

10.1109/taslpro.2025.3530321 article EN IEEE Transactions on Audio Speech and Language Processing 2025-01-01

Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer

OPENALEX - Publications

Hu Hu Sabato Marco Siniscalchi Chao-Han Huck Yang Chin‐Hui Lee

In this work, we propose a novel variational Bayesian adaptive learning approach for cross-domain knowledge transfer to address acoustic mismatches between training and testing conditions, such as recording devices environmental noise. Different from the traditional approaches that impose uncertainties on model parameters risking curse of dimensionality due huge number parameters, focus estimating manageable latent variables in deep neural models. Knowledge learned source domain is thus...

10.48550/arxiv.2501.15496 preprint EN arXiv (Cornell University) 2025-01-26

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

OPENALEX - Publications

Chen Chen Yuchen Hu Siyin Wang Helin Wang Zhehuai Chen and 3 more

An ideal multimodal agent should be aware of the quality its input modalities. Recent advances have enabled large language models (LLMs) to incorporate auditory systems for handling various speech-related tasks. However, most audio LLMs remain unaware speech they process. This limitation arises because evaluation is typically excluded from multi-task training due lack suitable datasets. To address this, we introduce first natural language-based corpus, generated authentic human ratings. In...

10.48550/arxiv.2501.17202 preprint EN arXiv (Cornell University) 2025-01-27

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

OPENALEX - Publications

William Chen Jinchuan Tian Yifan Peng Brian Yan Chao-Han Huck Yang and 1 more

Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these have been extensively characterized in other modalities, their behavior speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual recognition and translation models spanning 0.25B to 18B parameters, with the version being largest model, best our knowledge. OWLS leverages up 360K hours public data across 150...

10.48550/arxiv.2502.10373 preprint EN arXiv (Cornell University) 2025-02-14

Chain-of-Thought Prompting for Speech Translation

OPENALEX - Publications

Ke Hu Zhehuai Chen Chao-Han Huck Yang Piotr Żelasko Oleksii Hrinchuk and 3 more

10.1109/icassp49660.2025.10890560 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

OPENALEX - Publications

Ke-Han Lu Zhehuai Chen Szu‐Wei Fu Chao-Han Huck Yang Jagadeesh Balam and 3 more

10.1109/icassp49660.2025.10889444 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

OPENALEX - Publications

Yuanchao Li Yuan Gong Chao-Han Huck Yang Peter Bell Catherine Lai

10.1109/icassp49660.2025.10888591 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Projection Valued-based Quantum Machine Learning Adapting to Differential Privacy Algorithm for Word-level Lipreading

OPENALEX - Publications

Hang Chen Chang Wang Jun Du Chao-Han Huck Yang Jun Qi

10.1109/icassp49660.2025.10890305 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Research on Plant Feature Extraction and Intelligent Recognition System Based on Computer Vision

OPENALEX - Publications

Chao-Han Huck Yang Mei Yang Shu-Fang Guo

10.1109/citsc64390.2025.00159 article EN 2025-01-10

Wavelet Channel Attention Module With A Fusion Network For Single Image Deraining

OPENALEX - Publications

Hao-Hsiang Yang Chao-Han Huck Yang Yu-Chiang Frank Wang

Single image deraining is a crucial problem because rain severely degenerates the visibility of images and affects performance computer vision tasks like outdoor surveillance systems intelligent vehicles. In this paper, we propose new convolutional neural network (CNN) called wavelet channel attention module with fusion network. Wavelet transform inverse are substituted for down-sampling up-sampling so feature maps from convolutions contain different frequencies scales. Furthermore,...

10.1109/icip40778.2020.9190720 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2020-09-30

QTN-VQC: an end-to-end learning framework for quantum neural networks

OPENALEX - Publications

Jun Qi Chao-Han Huck Yang Pin‐Yu Chen

Abstract This work focuses on investigating an end-to-end learning approach for quantum neural networks (QNN) noisy intermediate-scale devices. The proposed model combines a tensor network (QTN) with variational circuit (VQC), resulting in QTN-VQC architecture. architecture integrates QTN horizontal or vertical structure related to the implementation of circuits tensor-train network. study provides theoretical insights into advantages pipeline based from two perspectives. first perspective...

10.1088/1402-4896/ad14d6 article EN cc-by Physica Scripta 2023-12-12

Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

OPENALEX - Publications

Yu Yu Chao-Han Huck Yang Jari Kolehmainen Prashanth Gurunath Shivakumar Yile Gu and 11 more

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...

10.1109/asru57964.2023.10389632 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16

Coming Soon ...