Oleg Rybakov

ORCID: 0009-0007-0021-047X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Topic Modeling
  • Adversarial Robustness in Machine Learning
  • Speech and Audio Processing
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Embedded Systems and FPGA Design
  • Neural Networks and Applications
  • Advanced Data Compression Techniques
  • Sensor Technology and Measurement Systems
  • Privacy-Preserving Technologies in Data
  • Semiconductor materials and devices

Google (United States)
2020-2024

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming non-streaming modes on mobile phones.NN model conversion from mode (model receives whole input sequence then returns classification result) to portion classifies it incrementally) may require manual rewriting.We address by designing a Tensorflow/Keras based library which allows automatic ones with minimum effort.With benchmark multiple KWS both phones demonstrate different tradeoffs between...

10.21437/interspeech.2020-1003 article EN Interspeech 2022 2020-10-25

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling from 8kHz to 16kHz while restoring high content level almost indistinguishable ground truth. The architecture is based on SEANet (Sound EnhAncement Network), wave-to-wave fully convolutional model, which uses combination feature losses and adversarial reconstruct an enhanced version input speech. addition, variant that can be deployed on-device in streaming mode,...

10.1109/icassp39728.2021.9413439 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of have cost memory budgets, which can be traded off with model quality by number parameters. In this work, we use ResNet as case study systematically investigate effects inference cost-quality tradeoff curves. Our results suggest that for each bfloat16 model, there are quantized...

10.1109/cvprw53098.2021.00345 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021-06-01

Federated learning (FL) has been widely used to train neural networks with the decentralized training procedure where data is only accessed on clients' devices for privacy preservation. However, limited computation resources prevent FL of large models. To overcome constraint, one possible method reduce memory usage quantized such as quantization aware a centralized server. directly applying methods does not consumption because full-precision model still in forward propagation computation....

10.1109/icassp48485.2024.10445824 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

With the rapid increase in size of neural networks, model compression has become an important area research. Quantization is effective technique at decreasing size, memory access, and compute load large models. Despite recent advances quantization aware training (QAT) technique, most papers present evaluations that are focused on computer vision tasks, which have different dynamics compared to sequence tasks. In this paper, we first benchmark impact popular techniques such as straight...

10.48550/arxiv.2305.15536 preprint EN cc-by arXiv (Cornell University) 2023-01-01

End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal (USM). However, deploying these massive USMs is extremely expensive due to enormous memory usage and computational cost. Therefore, model compression an important research topic fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, low-bit quantization N:M structured sparsity aware...

10.1109/icassp48485.2024.10448217 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

10.1109/slt61566.2024.10832205 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2024-12-02

Keyword spotting (KWS) on edge devices requires low power consumption and real-time response. In this work, a ferroelectric field-effect transistor (FeFET)-based compute-in-memory (CIM) architecture is proposed for streaming KWS processing. Compared with the conventional sequential processing scheme, inference latency reduced by 7.7 × ∼17.6× without energy efficiency loss. To make models robust to hardware non-idealities such as analog-to-digital converter (ADC) offset, an offset-aware...

10.1109/tetc.2023.3345346 article EN IEEE Transactions on Emerging Topics in Computing 2023-12-28

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling from 8kHz to 16kHz while restoring high content level almost indistinguishable ground truth. The architecture is based on SEANet (Sound EnhAncement Network), wave-to-wave fully convolutional model, which uses combination feature losses and adversarial reconstruct an enhanced version input speech. addition, variant that can be deployed on-device in streaming mode,...

10.48550/arxiv.2010.10677 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...