NFDI4DS | UHH-SEMS - Publication Details

Streaming Keyword Spotting on Mobile Devices

OPENALEX - Publications

Oleg Rybakov Natasha Kononenko Niranjan Subrahmanya Mirkó Visontai Stella Laurenzo

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming non-streaming modes on mobile phones.NN model conversion from mode (model receives whole input sequence then returns classification result) to portion classifies it incrementally) may require manual rewriting.We address by designing a Tensorflow/Keras based library which allows automatic ones with minimum effort.With benchmark multiple KWS both phones demonstrate different tradeoffs between...

10.21437/interspeech.2020-1003 article EN Interspeech 2022 2020-10-25

Real-Time Speech Frequency Bandwidth Extension

OPENALEX - Publications

Yunpeng Li Marco Tagliasacchi Oleg Rybakov Victor Ungureanu Dominik Roblek

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling from 8kHz to 16kHz while restoring high content level almost indistinguishable ground truth. The architecture is based on SEANet (Sound EnhAncement Network), wave-to-wave fully convolutional model, which uses combination feature losses and adversarial reconstruct an enhanced version input speech. addition, variant that can be deployed on-device in streaming mode,...

10.1109/icassp39728.2021.9413439 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

OPENALEX - Publications

AmirAli Abdolrashidi Lisa Wang Shivani Agrawal Jonathan Malmaud Oleg Rybakov and 2 more

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of have cost memory budgets, which can be traded off with model quality by number parameters. In this work, we use ResNet as case study systematically investigate effects inference cost-quality tradeoff curves. Our results suggest that for each bfloat16 model, there are quantized...

10.1109/cvprw53098.2021.00345 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

FedAQT: Accurate Quantized Training with Federated Learning

OPENALEX - Publications

Renkun Ni Yonghui Xiao Phoenix Meadowlark‎ Oleg Rybakov Tom Goldstein and 4 more

Federated learning (FL) has been widely used to train neural networks with the decentralized training procedure where data is only accessed on clients' devices for privacy preservation. However, limited computation resources prevent FL of large models. To overcome constraint, one possible method reduce memory usage quantized such as quantization aware a centralized server. directly applying methods does not consumption because full-precision model still in forward propagation computation....

10.1109/icassp48485.2024.10445824 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

OPENALEX - Publications

David Qiu David Rim Shaojin Ding Oleg Rybakov Yanzhang He

With the rapid increase in size of neural networks, model compression has become an important area research. Quantization is effective technique at decreasing size, memory access, and compute load large models. Despite recent advances quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different dynamics compared to sequence tasks. In this paper, we first benchmark impact popular techniques such as straight...

10.48550/arxiv.2305.15536 preprint EN cc-by arXiv (Cornell University) 2023-01-01

USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models

OPENALEX - Publications

Shaojin Ding David Qiu David Rim Yanzhang He Oleg Rybakov and 8 more

End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal (USM). However, deploying these massive USMs is extremely expensive due to enormous memory usage and computational cost. Therefore, model compression an important research topic fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, low-bit quantization N:M structured sparsity aware...

10.1109/icassp48485.2024.10448217 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Rand: Robustness Aware Norm Decay for Quantized Neural Networks

OPENALEX - Publications

David Qiu David Rim Shaojin Ding Oleg Rybakov Yanzhang He

10.1109/slt61566.2024.10832205 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2024-12-02

Real time spectrogram inversion on mobile phone

OPENALEX - Publications

Oleg Rybakov Marco Tagliasacchi Yunpeng Li Liyang Jiang Xia Zhang and 1 more

10.21437/interspeech.2023-1137 article EN Interspeech 2022 2023-08-14

A FeFET-Based ADC Offset Robust Compute-In-Memory Architecture for Streaming Keyword Spotting (KWS)

OPENALEX - Publications

Yandong Luo Johan Vanderhaegen Oleg Rybakov Martin Kraemer Niel Warren and 1 more

Keyword spotting (KWS) on edge devices requires low power consumption and real-time response. In this work, a ferroelectric field-effect transistor (FeFET)-based compute-in-memory (CIM) architecture is proposed for streaming KWS processing. Compared with the conventional sequential processing scheme, inference latency reduced by 7.7 × ∼17.6× without energy efficiency loss. To make models robust to hardware non-idealities such as analog-to-digital converter (ADC) offset, an offset-aware...

10.1109/tetc.2023.3345346 article EN IEEE Transactions on Emerging Topics in Computing 2023-12-28

Real-time Speech Frequency Bandwidth Extension

OPENALEX - Publications

Yunpeng Li Marco Tagliasacchi Oleg Rybakov Victor Ungureanu Dominik Roblek

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling from 8kHz to 16kHz while restoring high content level almost indistinguishable ground truth. The architecture is based on SEANet (Sound EnhAncement Network), wave-to-wave fully convolutional model, which uses combination feature losses and adversarial reconstruct an enhanced version input speech. addition, variant that can be deployed on-device in streaming mode,...

10.48550/arxiv.2010.10677 preprint EN other-oa arXiv (Cornell University) 2020-01-01