NFDI4DS | UHH-SEMS - Publication Details

Krishna Somandepalli

ORCID: 0000-0002-2845-1079

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5008305849

Research Areas

Music and Audio Processing
Speech and Audio Processing
Speech Recognition and Synthesis
Video Analysis and Summarization
Generative Adversarial Networks and Image Synthesis
Attention Deficit Hyperactivity Disorder
Sentiment Analysis and Opinion Mining
Multimodal Machine Learning Applications
Music Technology and Sound Studies
Hate Speech and Cyberbullying Detection
Face recognition and analysis
Emotion and Mood Recognition
Video Surveillance and Tracking Methods
Functional Brain Connectivity Studies
Media Influence and Health
Image Retrieval and Classification Techniques
Voice and Speech Disorders
Human Pose and Action Recognition
Autism Spectrum Disorder Research
Crime, Deviance, and Social Control
Face Recognition and Perception
Evolutionary Psychology and Human Behavior
Adversarial Robustness in Machine Learning
Privacy-Preserving Technologies in Data
Aesthetic Perception and Analysis

University of Southern California
2016-2024

Google (United States)
2021-2024

LAC+USC Medical Center
2020-2022

Southern California University for Professional Studies
2019-2020

New York University
2015-2019

NYU Langone Health
2017

Short-term test–retest reliability of resting state fMRI metrics in children with and without attention-deficit/hyperactivity disorder

OPENALEX - Publications

Krishna Somandepalli Clare Kelly Philip T. Reiss Xi-Nian Zuo R. Cameron Craddock and 5 more

To date, only one study has examined test–retest reliability of resting state fMRI (R-fMRI) in children, none clinical developing groups. Here, we assessed short-term a sample 46 children (11–17.9 years) with attention-deficit/hyperactivity disorder (ADHD) and 57 typically (TDC). Our primary measure was the intraclass correlation coefficient (ICC), quantified for range R-fMRI metrics. We aimed to (1) survey within across diagnostic groups, (2) compare voxel-wise ICC between found...

10.1016/j.dcn.2015.08.003 article EN cc-by-nc-nd Developmental Cognitive Neuroscience 2015-08-11

Computational Media Intelligence: Human-Centered Machine Analysis of Media

OPENALEX - Publications

Krishna Somandepalli Tanaya Guha Victor R. Martínez Naveen Kumar Hartwig Adam and 1 more

Media is created by humans for to tell stories. There exists a natural and imminent need creating human-centered media analytics illuminate the stories being told understand their impact on individuals society at large. An objective understanding of content has numerous applications different stakeholders, from creators decision-/policy-makers consumers. Advances in multimodal signal processing machine learning (ML) can enable detailed nuanced characterization (of who, what, how, where, why)...

10.1109/jproc.2020.3047978 article EN publisher-specific-oa Proceedings of the IEEE 2021-01-13

MovieCLIP: Visual Scene Recognition in Movies

OPENALEX - Publications

Digbalay Bose Rajat Hebbar Krishna Somandepalli Haoyang Zhang Yin Cui and 3 more

Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes. Domain specific challenges associated scenes in include transitions, person coverage, and wide array real-life fictional scenarios. Existing scene datasets limited taxonomies don't consider the transition within movie clips. In this work, we address problem recognition by first automatically curating new extensive movie-centric taxonomy 179 labels derived from...

10.1109/wacv56688.2023.00212 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

How emotion is experienced and expressed in multiple cultures: a large-scale experiment across North America, Europe, and Japan

OPENALEX - Publications

Alan Cowen Jeffrey A. Brooks Gautam Prasad Misato Tanaka Yukiyasu Kamitani and 11 more

Core to understanding emotion are subjective experiences and their expression in facial behavior. Past studies have largely focused on six emotions prototypical poses, reflecting limitations scale narrow assumptions about the variety of patterns expression. We examine 45,231 reactions 2,185 evocative videos, North America, Europe, Japan, collecting participants’ self-reported English or Japanese manual automated annotations movement. Guided by Semantic Space Theory, we uncover 21 dimensions...

10.3389/fpsyg.2024.1350631 article EN Frontiers in Psychology 2024-06-20

Mode of Anisotropy Reveals Global Diffusion Alterations in Attention-Deficit/Hyperactivity Disorder

OPENALEX - Publications

Yuliya Yoncheva Krishna Somandepalli Philip T. Reiss Clare Kelly Adriana Di Martino and 4 more

10.1016/j.jaac.2015.11.011 article EN Journal of the American Academy of Child & Adolescent Psychiatry 2015-11-27

Violence Rating Prediction from Movie Scripts

OPENALEX - Publications

Victor R. Martínez Krishna Somandepalli Karan Singla Anil Ramakrishna Yalda T. Uhls and 1 more

Violent content in movies can influence viewers’ perception of the society. For example, frequent depictions certain demographics as perpetrators or victims abuse shape stereotyped attitudes. In this work, we propose to characterize aspects violent solely from language used scripts. This makes our method applicable a movie earlier stages creation even before it is produced. complementary previous works which rely on audio video post production. Our approach based broad range features...

10.1609/aaai.v33i01.3301671 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

VideoPoet: A Large Language Model for Zero-Shot Video Generation

OPENALEX - Publications

Dan Kondratyuk Lijun Yu Xiuye Gu José Lezama Jonathan Huang and 26 more

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from large variety conditioning signals. VideoPoet employs decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows Large Language Models (LLMs), consisting two stages: pretraining task-specific adaptation. During pretraining, incorporates mixture generative objectives within an autoregressive...

10.48550/arxiv.2312.14125 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images

OPENALEX - Publications

Krishna Somandepalli Asterios Toutios Shrikanth Narayanan

10.21437/interspeech.2017-1580 article EN Interspeech 2022 2017-08-16

Robust Speaker Recognition Using Unsupervised Adversarial Invariance

OPENALEX - Publications

Raghuveer Peri Monisankha Pal Arindam Jati Krishna Somandepalli Shrikanth Narayanan

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt recently proposed unsupervised adversarial invariance architecture train network that maps embeddings extracted pretrained model onto two lower dimensional embedding spaces. The spaces are learnt disentangle information from all other present audio recordings, without supervision about conditions....

10.1109/icassp40776.2020.9054601 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Test–retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging

OPENALEX - Publications

Johannes Töger Tanner Sorensen Krishna Somandepalli Asterios Toutios Sajan Goud Lingala and 2 more

Static anatomical and real-time dynamic magnetic resonance imaging (RT-MRI) of the upper airway is a valuable method for studying speech production in research clinical settings. The test–retest repeatability quantitative biomarkers an important parameter, since it limits effect sizes intragroup differences that can be studied. Therefore, this study aims to present framework determining from static MRI RT-MRI, apply healthy volunteers. Subjects (n = 8, 4 females, males) are imaged two scans...

10.1121/1.4983081 article EN The Journal of the Acoustical Society of America 2017-05-01

LanSER: Language-Model Supported Speech Emotion Recognition

OPENALEX - Publications

Taesik Gong Josh Belanich Krishna Somandepalli Arsha Nagrani Brian Eoff and 1 more

Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced taxonomies difficult. We present LanSER, a method that enables the use of unlabeled by inferring weak labels via pre-trained language through weakly-supervised learning. For constrained taxonomy, we textual entailment approach selects an label with highest score transcript extracted automatic recognition. Our experimental results show...

10.21437/interspeech.2023-1832 article EN Interspeech 2022 2023-08-14

Computerized cognitive training for children with neurofibromatosis type 1: A pilot resting-state fMRI study

OPENALEX - Publications

Yuliya Yoncheva Kristina K. Hardy Daniel J. Lurie Krishna Somandepalli Lanbo Yang and 6 more

10.1016/j.pscychresns.2017.06.003 article EN Psychiatry Research Neuroimaging 2017-06-06

Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data

OPENALEX - Publications

Asterios Toutios Tanner Sorensen Krishna Somandepalli Rachel Alexander Shrikanth Narayanan

10.21437/interspeech.2016-596 article EN Interspeech 2022 2016-08-29

Toward Visual Voice Activity Detection for Unconstrained Videos

OPENALEX - Publications

Rahul Sharma Krishna Somandepalli Shrikanth Narayanan

The prevalent audio-based Voice Activity Detection (VAD) systems are challenged by the presence of ambient noise and sensitive to variations in type noise. use information from visual modality, when available, can help overcome some problems VAD. Existing visual-VAD however do not operate directly on whole image but require intermediate face detection, landmark detection subsequent facial feature extraction lip region. In this work we present an end-to-end trainable Hierarchical Context...

10.1109/icip.2019.8803248 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Self-Supervised Graphs for Audio Representation Learning With Limited Labeled Data

OPENALEX - Publications

Amir Shirian Krishna Somandepalli Tanaya Guha

Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning representations from highly limited labelled data. Considering each sample as node, we propose subgraph-based framework novel self-supervision tasks that can learn effective representations. During training, subgraphs constructed by sampling the entire pool of available training data exploit relationship between and unlabeled samples. inference,...

10.1109/jstsp.2022.3190083 article EN IEEE Journal of Selected Topics in Signal Processing 2022-07-14

Unsupervised Discovery of Character Dictionaries in Animation Movies

OPENALEX - Publications

Krishna Somandepalli Naveen Kumar Tanaya Guha Shrikanth Narayanan

Automatic content analysis of animation movies can enable an objective understanding character (actor) representations and their portrayals. It also help illuminate potential markers unconscious biases impact. However, multimedia movie has predominantly focused on live-action features. A dearth research in this field is because the complexity heterogeneity design animated characters-an extremely challenging problem to be generalized by a single method or model. In paper, we address...

10.1109/tmm.2017.2745712 article EN IEEE Transactions on Multimedia 2017-08-28

Online Affect Tracking with Multimodal Kalman Filters

OPENALEX - Publications

Krishna Somandepalli Rahul Gupta Md Nasir Brandon M. Booth Sungbok Lee and 1 more

Arousal and valence have been widely used to represent emotions dimensionally measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry the Multimodal Affect Recognition Sub-Challenge of 2016 Audio/Visual Emotion Challenge Workshop (AVEC2016). We propose linear dynamical system approach with late fusion method that accounts dynamics state evolution (i.e., arousal or valence). To end,...

10.1145/2988257.2988259 article EN 2016-10-12

Robust Speech Activity Detection in Movie Audio: Data Resources and Experimental Evaluation

OPENALEX - Publications

Rajat Hebbar Krishna Somandepalli Shrikanth Narayanan

Speech activity detection in highly variable acoustic conditions is a challenging task. Many approaches to detect speech such involve an inherent knowledge of the noise types involved. Movie audio can offer excellent research test-bed for developing models. A robust movie also crucial step subsequent content analyses as diarization. Obtaining labels supervision data be very expensive, and may not scalable. In this paper, we employ simple, yet effective approach obtain by coarse aligning...

10.1109/icassp.2019.8682532 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Contextually-Rich Human Affect Perception Using Multimodal Scene Information

OPENALEX - Publications

Digbalay Bose Rajat Hebbar Krishna Somandepalli Shrikanth Narayanan

The process of human affect understanding involves the ability to infer person specific emotional states from various sources including images, speech, and language. Affect perception images has predominantly focused on expressions extracted salient face crops. However, emotions perceived by humans rely multiple contextual cues social settings, foreground interactions, ambient visual scenes. In this work, we leverage pretrained vision-language (VLN) models extract descriptions context...

10.1109/icassp49357.2023.10095728 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

The Neural Correlates of Emotional Lability in Children with Autism Spectrum Disorder

OPENALEX - Publications

Randi Bennett Krishna Somandepalli Amy Krain Roy Adriana Di Martino

Autism spectrum disorder (ASD) is exceptionally heterogeneous in both clinical and physiopathological presentations. Clinical variability applies to ASD-specific symptoms frequent comorbid psychopathology such as emotional lability (EL). To date, the underpinnings of co-occurrence EL ASD are unknown. As a first step, we examined within-ASD inter-individual its neuronal correlates using resting-state functional magnetic resonance imaging (R-fMRI). We analyzed R-fMRI data from 58 children...

10.1089/brain.2016.0472 article EN Brain Connectivity 2017-05-16

Improving Gender Identification in Movie Audio Using Cross-Domain Data

OPENALEX - Publications

Rajat Hebbar Krishna Somandepalli Shrikanth Narayanan

10.21437/interspeech.2018-1462 article EN Interspeech 2022 2018-08-28

How emotion is experienced and expressed in multiple cultures: a large-scale experiment

OPENALEX - Publications

Alan Cowen Gautam Prasad Misato Tanaka Yukiyasu Kamitani Vladimir Kirilyuk and 6 more

Core to understanding emotion are subjective experiences and their embodiment in facial behavior. Past studies have focused on six emotions prototypical poses, reflecting limitations scale narrow assumptions about emotion. We examine 45,231 reactions 2,185 evocative videos, largely North America, Europe, Japan, collecting participants’ self-reported English or Japanese manual/automated annotations of movement. uncover 21 dimensions underlying reported across languages. Facial expressions...

10.31234/osf.io/gbqtc preprint EN 2021-06-29

Reinforcing Self-expressive Representation with Constraint Propagation for Face Clustering in Movies

OPENALEX - Publications

Krishna Somandepalli Shrikanth Narayanan

The ability to robustly cluster faces in movies is a necessary step understanding media content representations of people along dimensions such as gender and age. Building upon the successes sparse subspace clustering (SSC) uncovering underlying structure data, this paper we propose an algorithm called Constraint Propagation Sparse Subspace Clustering (CP-SSC) for applications face videos where pairwise sample constraints (must-link cannot-link pairs) are available processing pipeline since...

10.1109/icassp.2019.8682314 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

A Dataset for Audio-Visual Sound Event Detection in Movies

OPENALEX - Publications

Rajat Hebbar Digbalay Bose Krishna Somandepalli Veena Vijai Shrikanth Narayanan

Audio event detection is a widely studied field, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which expensive perform at scale. Movies depict various real-life fictional scenarios makes them rich resource for mining wide range of audio events. In work, we present dataset events called Subtitle-Aligned Movie Sounds (SAM-S)....

10.1109/icassp49357.2023.10094781 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Coming Soon ...