Prashanth Gurunath Shivakumar

ORCID: 0000-0003-1632-3309
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Topic Modeling
  • Natural Language Processing Techniques
  • Music and Audio Processing
  • Speech and dialogue systems
  • Sentiment Analysis and Opinion Mining
  • African Botany and Ecology Studies
  • Cleft Lip and Palate Research
  • Allelopathy and phytotoxic interactions
  • Emotion and Mood Recognition
  • Multimodal Machine Learning Applications
  • Phonetics and Phonology Research
  • Phytochemicals and Antioxidant Activities

Amazon (United States)
2023-2025

University of Southern California
2014-2022

Southern California University for Professional Studies
2014-2021

University of California, San Francisco
2021

University of California, Los Angeles
2021

Signal Processing (United States)
2018-2019

Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal system as part the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate number audio video features for with different fusion techniques temporal contexts. modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while best accuracy is achieved i-vector modelling based on MFCC features. On...

10.1145/2988257.2988261 article EN 2016-10-12

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...

10.1109/asru57964.2023.10389632 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16

The Bignoniaceae family, comprising over 100 genera and 800 species, is a rich source of ornamental plants with medicinal properties. This review focuses on the pharmacological uses, bioactive compounds, future aspects selected including Pyrostegia venusta, Jacaranda mimosifolia, Tabebuia spp., others. These have been traditionally used to treat various ailments, recent studies confirmed their antimicrobial, anti-inflammatory, antioxidant compounds responsible for these properties include...

10.9734/ajee/2025/v24i1652 article EN Asian Journal of Environment & Ecology 2025-01-16

10.1109/icassp49660.2025.10890616 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Decoding speaker's intent is a crucial part of spoken language understanding (SLU).The presence noise or errors in the text transcriptions, real life scenarios make task more challenging.In this paper, we address detection under noisy conditions imposed by automatic speech recognition (ASR) systems.We propose to employ con-fusion2vec word feature representation compensate for made ASR and increase robustness SLU system.The confusion2vec, motivated from human production perception, models...

10.21437/interspeech.2019-2226 preprint EN Interspeech 2022 2019-09-13

Word vector representations are a crucial part of natural language processing (NLP) and human computer interaction. In this paper, we propose novel word representation, Confusion2Vec, motivated from the speech production perception that encodes representational ambiguity. Humans employ both acoustic similarity cues contextual to decode information focus on model incorporates sources information. The ambiguity acoustics, which manifests itself in confusions, is often resolved by humans...

10.7717/peerj-cs.195 article EN cc-by PeerJ Computer Science 2019-06-10

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example words acoustics using short-term context, prior rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel propose an error correction system that can learn from the aggregate of all independent modules constituting attempt invert those. The proposed exploit neural...

10.1017/atsip.2018.31 article EN cc-by-nc APSIPA Transactions on Signal and Information Processing 2019-01-01

In the realm of spoken language understanding (SLU). numerous natural (NLU) methodologies have been adapted by supplying large models (LLMs) with transcribed speech instead conventional written text. real-world scenarios, prior to input into an LLM. automated recognition (ASR) system generates output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes ASR system's lattice relying solely on top aiming encapsulate ambiguities...

10.1109/icassp48485.2024.10447938 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, sentiment, emotion, speaking style, which are essential for achieving natural, human-like spoken conversation, especially when information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM that utilizes text speech...

10.1109/icassp48485.2024.10446933 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

We propose a simplified and supervised i-vector modeling scheme for the speaker age regression task. The is obtained by concatenating label vector linear matrix at end of mean super-vector factor loading matrix, respectively. Different designs are proposed to increase robustness models. Finally, Support Vector Regression (SVR) deployed estimate speakers. method outperforms conventional baseline estimation. A relative 2.4% decrease in Mean Absolute Error 3.33% correlation coefficient achieved...

10.1109/icassp.2014.6854520 article EN 2014-05-01

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...

10.48550/arxiv.2309.15223 preprint EN cc-by arXiv (Cornell University) 2023-01-01

In the realm of spoken language understanding (SLU), numerous natural (NLU) methodologies have been adapted by supplying large models (LLMs) with transcribed speech instead conventional written text. real-world scenarios, prior to input into an LLM, automated recognition (ASR) system generates output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes ASR system's lattice relying solely on top aiming encapsulate ambiguities...

10.48550/arxiv.2401.02921 preprint EN cc-by-sa arXiv (Cornell University) 2024-01-01

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions 3.50\% on the public Librispeech dataset and 3.67\% an internal in messaging domain. To further characterize stability LoRA-based...

10.48550/arxiv.2401.10447 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large models, it important to extend pure text based methods incorporate other modalities in retrieval as well applications across wide spectrum of machine learning tasks and data types. In this work, we propose with two approaches: kNN-LM cross-attention techniques. We demonstrate effectiveness our approaches empirically by applying them automatic...

10.48550/arxiv.2406.09618 preprint EN arXiv (Cornell University) 2024-06-13
Coming Soon ...