- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Quantum Computing Algorithms and Architecture
- Adversarial Robustness in Machine Learning
- Topic Modeling
- Multimodal Machine Learning Applications
- Anomaly Detection Techniques and Applications
- Video Analysis and Summarization
- Privacy-Preserving Technologies in Data
- Reinforcement Learning in Robotics
- Neural Networks and Applications
- Advanced Image and Video Retrieval Techniques
- Advancements in Semiconductor Devices and Circuit Design
- Speech and dialogue systems
- Domain Adaptation and Few-Shot Learning
- Quantum and electron transport phenomena
- Quantum Information and Cryptography
- Image Enhancement Techniques
- Tensor decomposition and applications
- Human Pose and Action Recognition
- Advanced Image Processing Techniques
- Geophysical Methods and Applications
- Stochastic Gradient Optimization Techniques
Nvidia (United States)
2024-2025
Nvidia (United Kingdom)
2023-2025
Georgia Institute of Technology
2019-2024
Amazon (United States)
2021-2024
Google (United States)
2023
National Yang Ming Chiao Tung University
2023
King Abdullah University of Science and Technology
2021-2023
Universidad San Pablo CEU
2022
China University of Mining and Technology
2020
The state-of-the-art machine learning approaches are based on classical von Neumann computing architectures and have been widely used in many industrial academic domains. With the recent development of quantum computing, researchers tech-giants attempted new circuits for tasks. However, existing platforms hard to simulate deep models or problems because intractability circuits. Thus, it is necessary design feasible algorithms noisy intermediate scale (NISQ) devices. This work explores...
We propose a novel decentralized feature extraction approach in federated learning to address privacy-preservation issues for speech recognition. It is built upon quantum convolutional neural network (QCNN) composed of circuit encoder extraction, and recurrent (RNN) based end-to-end acoustic model (AM). To enhance parameter protection architecture, an input first up-streamed computing server extract Mel-spectrogram, the corresponding features are encoded using algorithm with random...
Single image dehazing is the ill-posed two-dimensional signal reconstruction problem. Recently, deep convolutional neural networks (CNN) have been successfully used in many computer vision problems. In this paper, we propose a Y-net that named for its structure. This network reconstructs clear images by aggregating multi-scale features maps. Additionally, Wavelet Structure SIMilarity (W-SSIM) loss function training step. proposed function, discrete wavelet transforms are applied repeatedly...
Abstract The noisy intermediate-scale quantum devices enable the implementation of variational circuit (VQC) for neural networks (QNN). Although VQC-based QNN has succeeded in many machine learning tasks, representation and generalization powers VQC still require further investigation, particularly when dimensionality classical inputs is concerned. In this work, we first put forth an end-to-end QNN, TTN-VQC, which consists a tensor network based on tensor-train (TTN) reduction functional...
Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems. In this work, we present a U-Net attention model, U-Net$_{At}$, enhance signals. Specifically, evaluate model performance by interpretable metrics and discuss augmented training. Our experiments show that our proposed U-Net$_{At}$ improves perceptual evaluation of quality (PESQ) from 1.13 2.78, transmission index (STI) 0.65 0.75, short-term objective...
In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency accuracy. The proposed is composed of a deep neural networks-based (DNN-based) module, including identifier clinical description generator, DNN visual explanation module. To train validate effectiveness our DNN-based large-scale image dataset. Also, as ground truth, provide dataset manually labeled by qualitatively...
The rapid development of quantum computing has demonstrated many unique characteristics advantages, such as richer feature representation and more secured protection on model parameters. This work proposes a vertical federated learning architecture based variational circuits to demonstrate the competitive performance quantum-enhanced pre-trained BERT for text classification. In particular, our proposed hybrid classical-quantum consists novel random temporal convolution (QTC) framework...
In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic recognition (ASR) models to recognize the other languages. We design different auxiliary architectures focusing learnable pre-trained feature enhancement that, first time, empowers ASR. Specifically, investigate how select trainable components (i.e., encoder) of conformer-based RNN-Transducer, as...
Transfer learning (TL) approaches have shown promising results when handling tasks with limited training data. However, considerable memory and computational resources are often required for fine-tuning pre-trained neural networks target domain In this work, we introduce a novel method leveraging speech models low-resource music classification based on the concept of Neural Model Reprogramming (NMR). NMR aims at re-purposing model from source to by modifying input frozen cross-modal...
In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. comprises two different sub-tasks: (i) 1a focuses on ASC audio signals recorded with multiple (real simulated) devices into ten fine-grained classes, (ii) 1b concerns classification data three higher-level classes using low-complexity solutions. For 1a, propose novel two-stage system leveraging upon ad-hoc...
To improve device robustness, a highly desirable key feature of competitive data-driven acoustic scene classification (ASC) system, novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our leverages an ad-hoc score combination two CNN classifiers: (i) the first classifies inputs into one three broad classes, and (ii) second same ten finergrained classes. Three different architectures are explored to implement classifiers, frequency sub-sampling scheme...
Neural speech editing advancements have raised concerns about their misuse in spoofing attacks. Traditional partially edited corpora primarily focus on cut-and-paste edits, which, while maintaining speaker consistency, often introduce detectable discontinuities. Recent methods, like A\textsuperscript{3}T and Voicebox, improve transitions by leveraging contextual information. To foster detection research, we the Speech INfilling Edit (SINE) dataset, created with Voicebox. We detailed process...
In this work, we propose a novel variational Bayesian adaptive learning approach for cross-domain knowledge transfer to address acoustic mismatches between training and testing conditions, such as recording devices environmental noise. Different from the traditional approaches that impose uncertainties on model parameters risking curse of dimensionality due huge number parameters, focus estimating manageable latent variables in deep neural models. Knowledge learned source domain is thus...
An ideal multimodal agent should be aware of the quality its input modalities. Recent advances have enabled large language models (LLMs) to incorporate auditory systems for handling various speech-related tasks. However, most audio LLMs remain unaware speech they process. This limitation arises because evaluation is typically excluded from multi-task training due lack suitable datasets. To address this, we introduce first natural language-based corpus, generated authentic human ratings. In...
Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these have been extensively characterized in other modalities, their behavior speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual recognition and translation models spanning 0.25B to 18B parameters, with the version being largest model, best our knowledge. OWLS leverages up 360K hours public data across 150...
Single image deraining is a crucial problem because rain severely degenerates the visibility of images and affects performance computer vision tasks like outdoor surveillance systems intelligent vehicles. In this paper, we propose new convolutional neural network (CNN) called wavelet channel attention module with fusion network. Wavelet transform inverse are substituted for down-sampling up-sampling so feature maps from convolutions contain different frequencies scales. Furthermore,...
Abstract This work focuses on investigating an end-to-end learning approach for quantum neural networks (QNN) noisy intermediate-scale devices. The proposed model combines a tensor network (QTN) with variational circuit (VQC), resulting in QTN-VQC architecture. architecture integrates QTN horizontal or vertical structure related to the implementation of circuits tensor-train network. study provides theoretical insights into advantages pipeline based from two perspectives. first perspective...
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...