NFDI4DS | UHH-SEMS - Publication Details

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

OPENALEX - Publications

Md Nasir Arindam Jati Prashanth Gurunath Shivakumar Sandeep Nallan Chakravarthula Panayiotis Georgiou

Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal system as part the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate number audio video features for with different fusion techniques temporal contexts. modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while best accuracy is achieved i-vector modelling based on MFCC features. On...

10.1145/2988257.2988261 article EN 2016-10-12

Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations

OPENALEX - Publications

Prashanth Gurunath Shivakumar Panayiotis Georgiou

10.1016/j.csl.2020.101077 article EN Computer Speech & Language 2020-02-18

Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement

OPENALEX - Publications

Prashanth Gurunath Shivakumar Panayiotis Georgiou

10.21437/interspeech.2016-1284 article EN Interspeech 2022 2016-08-29

End-to-end neural systems for automatic children speech recognition: An empirical study

OPENALEX - Publications

Prashanth Gurunath Shivakumar Shrikanth Narayanan

10.1016/j.csl.2021.101289 article EN Computer Speech & Language 2021-09-17

Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

OPENALEX - Publications

Yu Yu Chao-Han Huck Yang Jari Kolehmainen Prashanth Gurunath Shivakumar Yile Gu and 11 more

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...

10.1109/asru57964.2023.10389632 article EN 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023-12-16

Ornamental Plants of Bignoniaceae Family: Source of Bioactive Compounds with Therapeutic Applications and Ecological Services

OPENALEX - Publications

Kadambini Das Nidhi Mahendru Bhagwati Prashad Sharma Prashanth Gurunath Shivakumar Anil Kumar Sharma and 1 more

The Bignoniaceae family, comprising over 100 genera and 800 species, is a rich source of ornamental plants with medicinal properties. This review focuses on the pharmacological uses, bioactive compounds, future aspects selected including Pyrostegia venusta, Jacaranda mimosifolia, Tabebuia spp., others. These have been traditionally used to treat various ailments, recent studies confirmed their antimicrobial, anti-inflammatory, antioxidant compounds responsible for these properties include...

10.9734/ajee/2025/v24i1652 article EN Asian Journal of Environment & Ecology 2025-01-16

Speech Recognition Rescoring with Large Speech-Text Foundation Models

OPENALEX - Publications

Prashanth Gurunath Shivakumar Jari Kolehmainen Aditya Gourav Yi Gu Ankur Gandhe and 2 more

10.1109/icassp49660.2025.10890616 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Spoken Language Intent Detection Using Confusion2Vec

OPENALEX - Publications

Prashanth Gurunath Shivakumar Mu Yang Panayiotis Georgiou

Decoding speaker's intent is a crucial part of spoken language understanding (SLU).The presence noise or errors in the text transcriptions, real life scenarios make task more challenging.In this paper, we address detection under noisy conditions imposed by automatic speech recognition (ASR) systems.We propose to employ con-fusion2vec word feature representation compensate for made ASR and increase robustness SLU system.The confusion2vec, motivated from human production perception, models...

10.21437/interspeech.2019-2226 preprint EN Interspeech 2022 2019-09-13

Confusion2Vec: towards enriching vector space word representations with representational ambiguities

OPENALEX - Publications

Prashanth Gurunath Shivakumar Panayiotis Georgiou

Word vector representations are a crucial part of natural language processing (NLP) and human computer interaction. In this paper, we propose novel word representation, Confusion2Vec, motivated from the speech production perception that encodes representational ambiguity. Humans employ both acoustic similarity cues contextual to decode information focus on model incorporates sources information. The ambiguity acoustics, which manifests itself in confusions, is often resolved by humans...

10.7717/peerj-cs.195 article EN cc-by PeerJ Computer Science 2019-06-10

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

OPENALEX - Publications

Prashanth Gurunath Shivakumar Haoqi Li Kevin Knight Panayiotis Georgiou

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example words acoustics using short-term context, prior rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel propose an error correction system that can learn from the aggregate of all independent modules constituting attempt invert those. The proposed exploit neural...

10.1017/atsip.2018.31 article EN cc-by-nc APSIPA Transactions on Signal and Information Processing 2019-01-01

Towards ASR Robust Spoken Language Understanding Through in-Context Learning with Word Confusion Networks

OPENALEX - Publications

Kevin Everson Yile Gu Huck Yang Prashanth Gurunath Shivakumar Guan-Ting Lin and 8 more

In the realm of spoken language understanding (SLU). numerous natural (NLU) methodologies have been adapted by supplying large models (LLMs) with transcribed speech instead conventional written text. real-world scenarios, prior to input into an LLM. automated recognition (ASR) system generates output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes ASR system's lattice relying solely on top aiming encapsulate ambiguities...

10.1109/icassp48485.2024.10447938 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

OPENALEX - Publications

Guan-Ting Lin Prashanth Gurunath Shivakumar Ankur Gandhe Chao-Han Huck Yang Yile Gu and 4 more

Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, sentiment, emotion, speaking style, which are essential for achieving natural, human-like spoken conversation, especially when information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM that utilizes text speech...

10.1109/icassp48485.2024.10446933 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

Simplified and supervised i-vector modeling for speaker age regression

OPENALEX - Publications

Prashanth Gurunath Shivakumar Ming Li Vedant Dhandhania Shrikanth Narayanan

We propose a simplified and supervised i-vector modeling scheme for the speaker age regression task. The is obtained by concatenating label vector linear matrix at end of mean super-vector factor loading matrix, respectively. Different designs are proposed to increase robustness models. Finally, Support Vector Regression (SVR) deployed estimate speakers. method outperforms conventional baseline estimation. A relative 2.4% decrease in Mean Absolute Error 3.33% correlation coefficient achieved...

10.1109/icassp.2014.6854520 article EN 2014-05-01

Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification

OPENALEX - Publications

Prashanth Gurunath Shivakumar Sandeep Nallan Chakravarthula Panayiotis Georgiou

10.21437/interspeech.2016-1312 article EN Interspeech 2022 2016-08-29

Scaling Laws for Discriminative Speech Recognition Rescoring Models

OPENALEX - Publications

Yile Gu Prashanth Gurunath Shivakumar Jari Kolehmainen Ankur Gandhe Ariya Rastrow and 1 more

10.21437/interspeech.2023-2128 article EN Interspeech 2022 2023-08-14

Distillation Strategies for Discriminative Speech Recognition Rescoring

OPENALEX - Publications

Prashanth Gurunath Shivakumar Jari Kolehmainen Yile Gu Ankur Gandhe Ariya Rastrow and 1 more

10.21437/interspeech.2023-1981 article EN Interspeech 2022 2023-08-14

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

OPENALEX - Publications

Yu Yu Chao-Han Huck Yang Jari Kolehmainen Prashanth Gurunath Shivakumar Yile Gu and 13 more

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...

10.48550/arxiv.2309.15223 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

OPENALEX - Publications

Kevin Everson Yile Gu Huck Yang Prashanth Gurunath Shivakumar Guan-Ting Lin and 8 more

In the realm of spoken language understanding (SLU), numerous natural (NLU) methodologies have been adapted by supplying large models (LLMs) with transcribed speech instead conventional written text. real-world scenarios, prior to input into an LLM, automated recognition (ASR) system generates output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes ASR system's lattice relying solely on top aiming encapsulate ambiguities...

10.48550/arxiv.2401.02921 preprint EN cc-by-sa arXiv (Cornell University) 2024-01-01

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

OPENALEX - Publications

Yu Yu Chao-Han Huck Yang Tuan Dinh Sungho Ryu Jari Kolehmainen and 8 more

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions 3.50\% on the public Librispeech dataset and 3.67\% an internal in messaging domain. To further characterize stability LoRA-based...

10.48550/arxiv.2401.10447 preprint EN cc-by arXiv (Cornell University) 2024-01-01

Multi-Modal Retrieval For Large Language Model Based Speech Recognition

OPENALEX - Publications

Jari Kolehmainen Aditya Gourav Prashanth Gurunath Shivakumar Yile Gu Ankur Gandhe and 3 more

Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large models, it important to extend pure text based methods incorporate other modalities in retrieval as well applications across wide spectrum of machine learning tasks and data types. In this work, we propose with two approaches: kNN-LM cross-attention techniques. We demonstrate effectiveness our approaches empirically by applying them automatic...

10.48550/arxiv.2406.09618 preprint EN arXiv (Cornell University) 2024-06-13

Multi-Modal Retrieval For Large Language Model Based Speech Recognition

OPENALEX - Publications

Aditya Gourav Jari Kolehmainen Prashanth Gurunath Shivakumar Yile Gu Grant P. Strimel and 3 more

10.18653/v1/2024.findings-acl.262 article IT Findings of the Association for Computational Linguistics: ACL 2022 2024-01-01