- Speech Recognition and Synthesis
- Music and Audio Processing
- Topic Modeling
- Adversarial Robustness in Machine Learning
- Speech and Audio Processing
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Embedded Systems and FPGA Design
- Neural Networks and Applications
- Advanced Data Compression Techniques
- Sensor Technology and Measurement Systems
- Privacy-Preserving Technologies in Data
- Semiconductor materials and devices
Google (United States)
2020-2024
In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming non-streaming modes on mobile phones.NN model conversion from mode (model receives whole input sequence then returns classification result) to portion classifies it incrementally) may require manual rewriting.We address by designing a Tensorflow/Keras based library which allows automatic ones with minimum effort.With benchmark multiple KWS both phones demonstrate different tradeoffs between...
In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling from 8kHz to 16kHz while restoring high content level almost indistinguishable ground truth. The architecture is based on SEANet (Sound EnhAncement Network), wave-to-wave fully convolutional model, which uses combination feature losses and adversarial reconstruct an enhanced version input speech. addition, variant that can be deployed on-device in streaming mode,...
Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of have cost memory budgets, which can be traded off with model quality by number parameters. In this work, we use ResNet as case study systematically investigate effects inference cost-quality tradeoff curves. Our results suggest that for each bfloat16 model, there are quantized...
Federated learning (FL) has been widely used to train neural networks with the decentralized training procedure where data is only accessed on clients' devices for privacy preservation. However, limited computation resources prevent FL of large models. To overcome constraint, one possible method reduce memory usage quantized such as quantization aware a centralized server. directly applying methods does not consumption because full-precision model still in forward propagation computation....
With the rapid increase in size of neural networks, model compression has become an important area research. Quantization is effective technique at decreasing size, memory access, and compute load large models. Despite recent advances quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different dynamics compared to sequence tasks. In this paper, we first benchmark impact popular techniques such as straight...
End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal (USM). However, deploying these massive USMs is extremely expensive due to enormous memory usage and computational cost. Therefore, model compression an important research topic fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, low-bit quantization N:M structured sparsity aware...
Keyword spotting (KWS) on edge devices requires low power consumption and real-time response. In this work, a ferroelectric field-effect transistor (FeFET)-based compute-in-memory (CIM) architecture is proposed for streaming KWS processing. Compared with the conventional sequential processing scheme, inference latency reduced by 7.7 × ∼17.6× without energy efficiency loss. To make models robust to hardware non-idealities such as analog-to-digital converter (ADC) offset, an offset-aware...
In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling from 8kHz to 16kHz while restoring high content level almost indistinguishable ground truth. The architecture is based on SEANet (Sound EnhAncement Network), wave-to-wave fully convolutional model, which uses combination feature losses and adversarial reconstruct an enhanced version input speech. addition, variant that can be deployed on-device in streaming mode,...