Chao-Han Huck Yang

ORCID: 0000-0003-2879-8811
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Quantum Computing Algorithms and Architecture
  • Adversarial Robustness in Machine Learning
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Anomaly Detection Techniques and Applications
  • Video Analysis and Summarization
  • Privacy-Preserving Technologies in Data
  • Reinforcement Learning in Robotics
  • Neural Networks and Applications
  • Advanced Image and Video Retrieval Techniques
  • Advancements in Semiconductor Devices and Circuit Design
  • Speech and dialogue systems
  • Domain Adaptation and Few-Shot Learning
  • Quantum and electron transport phenomena
  • Quantum Information and Cryptography
  • Image Enhancement Techniques
  • Tensor decomposition and applications
  • Human Pose and Action Recognition
  • Advanced Image Processing Techniques
  • Geophysical Methods and Applications
  • Stochastic Gradient Optimization Techniques

Nvidia (United States)
2024-2025

Nvidia (United Kingdom)
2023-2025

Georgia Institute of Technology
2019-2024

Amazon (United States)
2021-2024

Google (United States)
2023

National Yang Ming Chiao Tung University
2023

King Abdullah University of Science and Technology
2021-2023

Universidad San Pablo CEU
2022

China University of Mining and Technology
2020

The state-of-the-art machine learning approaches are based on classical von Neumann computing architectures and have been widely used in many industrial academic domains. With the recent development of quantum computing, researchers tech-giants attempted new circuits for tasks. However, existing platforms hard to simulate deep models or problems because intractability circuits. Thus, it is necessary design feasible algorithms noisy intermediate scale (NISQ) devices. This work explores...

10.1109/access.2020.3010470 article EN cc-by IEEE Access 2020-01-01

We propose a novel decentralized feature extraction approach in federated learning to address privacy-preservation issues for speech recognition. It is built upon quantum convolutional neural network (QCNN) composed of circuit encoder extraction, and recurrent (RNN) based end-to-end acoustic model (AM). To enhance parameter protection architecture, an input first up-streamed computing server extract Mel-spectrogram, the corresponding features are encoded using algorithm with random...

10.1109/icassp39728.2021.9413453 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Single image dehazing is the ill-posed two-dimensional signal reconstruction problem. Recently, deep convolutional neural networks (CNN) have been successfully used in many computer vision problems. In this paper, we propose a Y-net that named for its structure. This network reconstructs clear images by aggregating multi-scale features maps. Additionally, Wavelet Structure SIMilarity (W-SSIM) loss function training step. proposed function, discrete wavelet transforms are applied repeatedly...

10.1109/icassp40776.2020.9053920 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Abstract The noisy intermediate-scale quantum devices enable the implementation of variational circuit (VQC) for neural networks (QNN). Although VQC-based QNN has succeeded in many machine learning tasks, representation and generalization powers VQC still require further investigation, particularly when dimensionality classical inputs is concerned. In this work, we first put forth an end-to-end QNN, TTN-VQC, which consists a tensor network based on tensor-train (TTN) reduction functional...

10.1038/s41534-022-00672-7 article EN cc-by npj Quantum Information 2023-01-09

Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems. In this work, we present a U-Net attention model, U-Net$_{At}$, enhance signals. Specifically, evaluate model performance by interpretable metrics and discuss augmented training. Our experiments show that our proposed U-Net$_{At}$ improves perceptual evaluation of quality (PESQ) from 1.13 2.78, transmission index (STI) 0.65 0.75, short-term objective...

10.1109/icassp40776.2020.9053288 preprint EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency accuracy. The proposed is composed of a deep neural networks-based (DNN-based) module, including identifier clinical description generator, DNN visual explanation module. To train validate effectiveness our DNN-based large-scale image dataset. Also, as ground truth, provide dataset manually labeled by qualitatively...

10.1109/wacv48630.2021.00249 article EN 2021-01-01

The rapid development of quantum computing has demonstrated many unique characteristics advantages, such as richer feature representation and more secured protection on model parameters. This work proposes a vertical federated learning architecture based variational circuits to demonstrate the competitive performance quantum-enhanced pre-trained BERT for text classification. In particular, our proposed hybrid classical-quantum consists novel random temporal convolution (QTC) framework...

10.1109/icassp43922.2022.9746412 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic recognition (ASR) models to recognize the other languages. We design different auxiliary architectures focusing learnable pre-trained feature enhancement that, first time, empowers ASR. Specifically, investigate how select trainable components (i.e., encoder) of conformer-based RNN-Transducer, as...

10.1109/icassp49357.2023.10094903 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Transfer learning (TL) approaches have shown promising results when handling tasks with limited training data. However, considerable memory and computational resources are often required for fine-tuning pre-trained neural networks target domain In this work, we introduce a novel method leveraging speech models low-resource music classification based on the concept of Neural Model Reprogramming (NMR). NMR aims at re-purposing model from source to by modifying input frozen cross-modal...

10.1109/icassp49357.2023.10096568 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. comprises two different sub-tasks: (i) 1a focuses on ASC audio signals recorded with multiple (real simulated) devices into ten fine-grained classes, (ii) 1b concerns classification data three higher-level classes using low-complexity solutions. For 1a, propose novel two-stage system leveraging upon ad-hoc...

10.48550/arxiv.2007.08389 preprint EN cc-by arXiv (Cornell University) 2020-01-01

To improve device robustness, a highly desirable key feature of competitive data-driven acoustic scene classification (ASC) system, novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our leverages an ad-hoc score combination two CNN classifiers: (i) the first classifies inputs into one three broad classes, and (ii) second same ten finergrained classes. Three different architectures are explored to implement classifiers, frequency sub-sampling scheme...

10.1109/icassp39728.2021.9414835 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Neural speech editing advancements have raised concerns about their misuse in spoofing attacks. Traditional partially edited corpora primarily focus on cut-and-paste edits, which, while maintaining speaker consistency, often introduce detectable discontinuities. Recent methods, like A\textsuperscript{3}T and Voicebox, improve transitions by leveraging contextual information. To foster detection research, we the Speech INfilling Edit (SINE) dataset, created with Voicebox. We detailed process...

10.48550/arxiv.2501.03805 preprint EN arXiv (Cornell University) 2025-01-07

In this work, we propose a novel variational Bayesian adaptive learning approach for cross-domain knowledge transfer to address acoustic mismatches between training and testing conditions, such as recording devices environmental noise. Different from the traditional approaches that impose uncertainties on model parameters risking curse of dimensionality due huge number parameters, focus estimating manageable latent variables in deep neural models. Knowledge learned source domain is thus...

10.48550/arxiv.2501.15496 preprint EN arXiv (Cornell University) 2025-01-26

An ideal multimodal agent should be aware of the quality its input modalities. Recent advances have enabled large language models (LLMs) to incorporate auditory systems for handling various speech-related tasks. However, most audio LLMs remain unaware speech they process. This limitation arises because evaluation is typically excluded from multi-task training due lack suitable datasets. To address this, we introduce first natural language-based corpus, generated authentic human ratings. In...

10.48550/arxiv.2501.17202 preprint EN arXiv (Cornell University) 2025-01-27

Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these have been extensively characterized in other modalities, their behavior speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual recognition and translation models spanning 0.25B to 18B parameters, with the version being largest model, best our knowledge. OWLS leverages up 360K hours public data across 150...

10.48550/arxiv.2502.10373 preprint EN arXiv (Cornell University) 2025-02-14

10.1109/icassp49660.2025.10890560 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10889444 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10888591 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10890305 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Single image deraining is a crucial problem because rain severely degenerates the visibility of images and affects performance computer vision tasks like outdoor surveillance systems intelligent vehicles. In this paper, we propose new convolutional neural network (CNN) called wavelet channel attention module with fusion network. Wavelet transform inverse are substituted for down-sampling up-sampling so feature maps from convolutions contain different frequencies scales. Furthermore,...

10.1109/icip40778.2020.9190720 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2020-09-30

Abstract This work focuses on investigating an end-to-end learning approach for quantum neural networks (QNN) noisy intermediate-scale devices. The proposed model combines a tensor network (QTN) with variational circuit (VQC), resulting in QTN-VQC architecture. architecture integrates QTN horizontal or vertical structure related to the implementation of circuits tensor-train network. study provides theoretical insights into advantages pipeline based from two perspectives. first perspective...

10.1088/1402-4896/ad14d6 article EN cc-by Physica Scripta 2023-12-12

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...

10.1109/asru57964.2023.10389632 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16
Coming Soon ...