- Speech Recognition and Synthesis
- Speech and Audio Processing
- Topic Modeling
- Natural Language Processing Techniques
- Music and Audio Processing
- Speech and dialogue systems
- Sentiment Analysis and Opinion Mining
- African Botany and Ecology Studies
- Cleft Lip and Palate Research
- Allelopathy and phytotoxic interactions
- Emotion and Mood Recognition
- Multimodal Machine Learning Applications
- Phonetics and Phonology Research
- Phytochemicals and Antioxidant Activities
Amazon (United States)
2023-2025
University of Southern California
2014-2022
Southern California University for Professional Studies
2014-2021
University of California, San Francisco
2021
University of California, Los Angeles
2021
Signal Processing (United States)
2018-2019
Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal system as part the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate number audio video features for with different fusion techniques temporal contexts. modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while best accuracy is achieved i-vector modelling based on MFCC features. On...
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...
The Bignoniaceae family, comprising over 100 genera and 800 species, is a rich source of ornamental plants with medicinal properties. This review focuses on the pharmacological uses, bioactive compounds, future aspects selected including Pyrostegia venusta, Jacaranda mimosifolia, Tabebuia spp., others. These have been traditionally used to treat various ailments, recent studies confirmed their antimicrobial, anti-inflammatory, antioxidant compounds responsible for these properties include...
Decoding speaker's intent is a crucial part of spoken language understanding (SLU).The presence noise or errors in the text transcriptions, real life scenarios make task more challenging.In this paper, we address detection under noisy conditions imposed by automatic speech recognition (ASR) systems.We propose to employ con-fusion2vec word feature representation compensate for made ASR and increase robustness SLU system.The confusion2vec, motivated from human production perception, models...
Word vector representations are a crucial part of natural language processing (NLP) and human computer interaction. In this paper, we propose novel word representation, Confusion2Vec, motivated from the speech production perception that encodes representational ambiguity. Humans employ both acoustic similarity cues contextual to decode information focus on model incorporates sources information. The ambiguity acoustics, which manifests itself in confusions, is often resolved by humans...
Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example words acoustics using short-term context, prior rescoring with long-term context based on linguistics. In this work we model ASR as a phrase-based noisy transformation channel propose an error correction system that can learn from the aggregate of all independent modules constituting attempt invert those. The proposed exploit neural...
In the realm of spoken language understanding (SLU). numerous natural (NLU) methodologies have been adapted by supplying large models (LLMs) with transcribed speech instead conventional written text. real-world scenarios, prior to input into an LLM. automated recognition (ASR) system generates output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes ASR system's lattice relying solely on top aiming encapsulate ambiguities...
Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, sentiment, emotion, speaking style, which are essential for achieving natural, human-like spoken conversation, especially when information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM that utilizes text speech...
We propose a simplified and supervised i-vector modeling scheme for the speaker age regression task. The is obtained by concatenating label vector linear matrix at end of mean super-vector factor loading matrix, respectively. Different designs are proposed to increase robustness models. Finally, Support Vector Regression (SVR) deployed estimate speakers. method outperforms conventional baseline estimation. A relative 2.4% decrease in Mean Absolute Error 3.33% correlation coefficient achieved...
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up pretraining stage and adapting to specific domains limit their practical use Here we present method decomposition train rescoring model adapt it new using only fraction (0.08%) parameters. These inserted matrices are optimized...
In the realm of spoken language understanding (SLU), numerous natural (NLU) methodologies have been adapted by supplying large models (LLMs) with transcribed speech instead conventional written text. real-world scenarios, prior to input into an LLM, automated recognition (ASR) system generates output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes ASR system's lattice relying solely on top aiming encapsulate ambiguities...
The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions 3.50\% on the public Librispeech dataset and 3.67\% an internal in messaging domain. To further characterize stability LoRA-based...
Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large models, it important to extend pure text based methods incorporate other modalities in retrieval as well applications across wide spectrum of machine learning tasks and data types. In this work, we propose with two approaches: kNN-LM cross-attention techniques. We demonstrate effectiveness our approaches empirically by applying them automatic...