NFDI4DS | UHH-SEMS - Publication Details

Tanaya Guha

ORCID: 0000-0003-2167-4891

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5021354054

Research Areas

Music and Audio Processing
Speech and Audio Processing
Video Surveillance and Tracking Methods
Emotion and Mood Recognition
Video Analysis and Summarization
Human Pose and Action Recognition
Image and Video Quality Assessment
Speech Recognition and Synthesis
Visual Attention and Saliency Detection
Face recognition and analysis
Advanced Image and Video Retrieval Techniques
Anomaly Detection Techniques and Applications
Multimodal Machine Learning Applications
Image Retrieval and Classification Techniques
Autonomous Vehicle Technology and Safety
Face and Expression Recognition
Autism Spectrum Disorder Research
Music Technology and Sound Studies
Neuroscience and Music Perception
Sentiment Analysis and Opinion Mining
Mental Health via Writing
Mental Health Research Topics
Advanced Image Fusion Techniques
Advanced Graph Neural Networks
Generative Adversarial Networks and Image Synthesis

University of Glasgow
2022-2025

University of Warwick
2019-2022

Indian Institute of Technology Kanpur
2016-2018

University of Southern California
2014-2016

University of British Columbia
2010-2014

Learning Sparse Representations for Human Action Recognition

OPENALEX - Publications

Tanaya Guha R.K. Ward

This paper explores the effectiveness of sparse representations obtained by learning a set overcomplete basis (dictionary) in context action recognition videos. Although this work concentrates on recognizing human movements-physical actions as well facial expressions-the proposed approach is fairly general and can be used to address other classification problems. In order model actions, three dictionary frameworks are investigated. An constructed using spatio-temporal descriptors (extracted...

10.1109/tpami.2011.253 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2011-12-29

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

OPENALEX - Publications

Rahul Gupta Nikolaos Malandrakis Bo Xiao Tanaya Guha Maarten Van Segbroeck and 3 more

Depression is one of the most common mood disorders. Technology has potential to assist in screening and treating people with depression by robustly modeling tracking complex behavioral cues associated disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition another challenge which stands benefit from such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding two phenomena their correlation observable across...

10.1145/2661806.2661810 article EN 2014-11-03

A Computational Study of Expressive Facial Dynamics in Children with Autism

OPENALEX - Publications

Tanaya Guha Zhaojun Yang Ruth B. Grossman Shrikanth Narayanan

Several studies have established that facial expressions of children with autism are often perceived as atypical, awkward or less engaging by typical adult observers. Despite this clear deficit in the quality expression production, very little is understood about its underlying mechanisms and characteristics. This paper takes a computational approach to studying details high functioning (HFA). The objective uncover those characteristics expressions, notably distinct from typically developing...

10.1109/taffc.2016.2578316 article EN publisher-specific-oa IEEE Transactions on Affective Computing 2016-06-08

In Defense of Scene Graphs for Image Captioning

OPENALEX - Publications

Kien Nguyen Subarna Tripathi Bang Du Tanaya Guha Truong Q. Nguyen

The mainstream image captioning models rely on Convolutional Neural Network (CNN) features to generate captions via recurrent models. Recently, scene graphs have been used augment so as leverage their structural semantics, such object entities, relationships and attributes. Several studies noted that the naive use of from a black-box graph generator harms performance graph-based incur overhead explicit decent captions. Addressing these challenges, we propose SG2Caps, framework utilizes only...

10.1109/iccv48922.2021.00144 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Fusioncount: Efficient Crowd Counting Via Multiscale Feature Fusion

OPENALEX - Publications

Yiming Ma Víctor Sánchez Tanaya Guha

State-of-the-art crowd counting models follow an encoder-decoder approach. Images are first processed by the encoder to extract features. Then, account for perspective distortion, highest-level feature map is fed extra components multiscale features, which input decoder generate densities. However, in these methods, features extracted at earlier stages during encoding underutilised, and modules can only capture a limited range of receptive fields, albeit with considerable computational cost....

10.1109/icip46576.2022.9897322 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2022-10-16

Compact Graph Architecture for Speech Emotion Recognition

OPENALEX - Publications

Amir Shirian Tanaya Guha

We propose a deep graph approach to address the task of speech emotion recognition. A compact, efficient and scalable way represent data is in form graphs. Following theory signal processing, we model as cycle or line graph. Such structure enables us construct Graph Convolution Network (GCN)-based architecture that can perform an accurate convolution contrast approximate used standard GCNs. evaluated performance our for recognition on popular IEMOCAP MSP-IMPROV databases. Our outperforms GCN...

10.1109/icassp39728.2021.9413876 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Dynamic Emotion Modeling With Learnable Graphs and Graph Inception Network

OPENALEX - Publications

Amir Shirian Subarna Tripathi Tanaya Guha

Human emotion is expressed, perceived and captured using a variety of dynamic data modalities, such as speech (verbal), videos (facial expressions) motion sensors (body gestures). We propose generalized approach to recognition that can adapt across modalities by modeling structured graphs. The motivation behind the graph build compact models without compromising on performance. To alleviate problem optimal construction, we cast this joint learning classification task. end, present Learnable...

10.1109/tmm.2021.3059169 article EN IEEE Transactions on Multimedia 2021-02-17

Computational Media Intelligence: Human-Centered Machine Analysis of Media

OPENALEX - Publications

Krishna Somandepalli Tanaya Guha Victor R. Martínez Naveen Kumar Hartwig Adam and 1 more

Media is created by humans for to tell stories. There exists a natural and imminent need creating human-centered media analytics illuminate the stories being told understand their impact on individuals society at large. An objective understanding of content has numerous applications different stakeholders, from creators decision-/policy-makers consumers. Advances in multimodal signal processing machine learning (ML) can enable detailed nuanced characterization (of who, what, how, where, why)...

10.1109/jproc.2020.3047978 article EN publisher-specific-oa Proceedings of the IEEE 2021-01-13

Explainable human-centered traits from head motion and facial expression dynamics

OPENALEX - Publications

Surbhi Madan Monika Gahalawat Tanaya Guha Roland Goecke Ramanathan Subramanian

We explore the efficacy of multimodal behavioral cues for explainable prediction personality and interview -specific traits. utilize elementary head-motion units named kinemes , atomic facial movements termed action speech features to estimate these human-centered Empirical results confirm that enable discovery multiple trait-specific behaviors while also enabling explainability in support predictions. For fusing cues, we decision feature-level fusion, an additive attention-based fusion...

10.1371/journal.pone.0313883 article EN cc-by PLoS ONE 2025-01-17

Active Listener: Continuous Generation of Listener’s Head Motion Response in Dyadic Interactions

OPENALEX - Publications

Bishal Ghosh Liying Li Tanaya Guha

10.1109/icassp49660.2025.10889429 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Image Similarity Using Sparse Representation and Compression Distance

OPENALEX - Publications

Tanaya Guha Rabab Ward

A new line of research uses compression methods to measure the similarity between signals. Two signals are considered similar if one can be compressed significantly when information other is known. The existing compression-based methods, although successful in discrete dimensional domain, do not work well context images. This paper proposes a sparse representation-based approach encode content an image using from image, and compactness (sparsity) representation as its compressibility (how...

10.1109/tmm.2014.2306175 article EN IEEE Transactions on Multimedia 2014-02-13

Multiple Object Forecasting: Predicting Future Object Locations in Diverse Environments

OPENALEX - Publications

Olly Styles Tanaya Guha Víctor Sánchez

This paper introduces the problem of multiple object forecasting (MOF), in which goal is to predict future bounding boxes tracked objects. In contrast existing works on trajectory primarily consider from a birds-eye perspective, we formulate an object-level perspective and call for prediction full boxes, rather than trajectories alone. Towards solving this task, introduce Citywalks dataset, consists over 200k high-resolution video frames. comprises footage recorded 21 cities 10 European...

10.1109/wacv45572.2020.9093446 article EN 2020-03-01

Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos

OPENALEX - Publications

Kranti Kumar Parida Neeraj Matiyali Tanaya Guha Gaurav Sharma

We present an audio-visual multimodal approach for the task of zero-shot learning (ZSL) classification and retrieval videos. ZSL has been studied extensively in recent past but primarily limited to visual modality images. demonstrate that both audio modalities are important Since a dataset study is currently not available, we also construct appropriate with 33 classes containing 156, 416 videos, from existing large scale event dataset. empirically show performance improves by adding tasks...

10.1109/wacv45572.2020.9093438 article EN 2020-03-01

On quantifying facial expression-related atypicality of children with Autism Spectrum Disorder

OPENALEX - Publications

Tanaya Guha Zhaojun Yang Anil Ramakrishna Ruth B. Grossman Darren Hedley and 2 more

Children with Autism Spectrum Disorder (ASD) are known to have difficulty in producing and perceiving emotional facial expressions. Their expressions often perceived as atypical by adult observers. This paper focuses on data driven ways analyze quantify atypicality of children ASD. Our objective is uncover those characteristics gestures that induce the sense Using a carefully collected motion capture database, without ASD compared within six basic emotion categories employing methods from...

10.1109/icassp.2015.7178080 article EN 2015-04-01

A multimodal mixture-of-experts model for dynamic emotion prediction in movies

OPENALEX - Publications

Ankit Goyal Naveen Kumar Tanaya Guha Shrikanth Narayanan

This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. The rich content is inherently multimodal, where evoked through both audio (music, speech) and video modalities. To capture such affective information, we put forth a set features that includes several novel as, Video Compressibility Histogram Facial Area (HFA). We propose Mixture Experts (MoE)-based fusion model dynamically combines information modalities for predicting movies. A learning...

10.1109/icassp.2016.7472192 article EN 2016-03-01

Learning Affective Correspondence between Music and Image

OPENALEX - Publications

Gaurav Verma Eeshan Gunesh Dhekane Tanaya Guha

We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip an image are considered similar (having true correspondence) if they have emotion content. In order to estimate crossmodal, emotion-centric similarity, we propose deep neural network architecture that learns project from two modalities common representation space, performs binary classification task predicting (true or false). To facilitate current study,...

10.1109/icassp.2019.8683133 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Sparse representation-based image quality assessment

OPENALEX - Publications

Tanaya Guha Ehsan Nezhadarya Rabab Ward

10.1016/j.image.2014.09.010 article EN Signal Processing Image Communication 2014-10-16

Motion-Capture Patterns of Voluntarily Mimicked Dynamic Facial Expressions in Children and Adolescents With and Without ASD

OPENALEX - Publications

Emily Zane Zhaojun Yang Lucia Pozzan Tanaya Guha Shrikanth Narayanan and 1 more

10.1007/s10803-018-3811-7 article EN Journal of Autism and Developmental Disorders 2018-11-08

CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification

OPENALEX - Publications

Yiming Ma Víctor Sánchez Tanaya Guha

The CLIP (Contrastive Language-Image Pretraining) model has exhibited outstanding performance in recognition problems, such as zero-shot image classification and object detection. However, its ability to count remains understudied due the inherent challenges of transforming counting--a regression task--into a task. In this paper, we investigate CLIP's potential counting, focusing specifically on estimating crowd sizes. Existing classification-based crowd-counting methods have encountered...

10.48550/arxiv.2403.09281 preprint EN arXiv (Cornell University) 2024-03-14

Computationally deconstructing movie narratives: An informatics approach

OPENALEX - Publications

Tanaya Guha Naveen Kumar Shrikanth Narayanan Stacy L. Smith

In general, popular films and screenplays follow a well defined storytelling paradigm that comprises three essential segments or acts: exposition (act I), conflict II) resolution III). Deconstructing movie into its narrative units can enrich semantic understanding of movies, help in summarization, navigation detection the key events. A multimodal framework for detecting such act structure is developed this paper. Various low-level features are designed extracted from video, audio text...

10.1109/icassp.2015.7178374 article EN 2015-04-01

A trajectory clustering approach to crowd flow segmentation in videos

OPENALEX - Publications

Rahul Sharma Tanaya Guha

This work proposes a trajectory clustering-based approach for segmenting flow patterns in high density crowd videos. The goal is to produce pixel-wise segmentation of video sequence (static camera), where each segment corresponds different motion pattern. Unlike previous studies that use only vectors, we extract full trajectories so as capture the complete temporal evolution region (block) sequence. extracted are dense, complex and often overlapping. A novel clustering algorithm developed...

10.1109/icip.2016.7532548 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2016-08-17

Multi-Camera Trajectory Forecasting: Pedestrian Trajectory Prediction in a Network of Cameras

OPENALEX - Publications

Olly Styles Tanaya Guha Víctor Sánchez Alex C. Kot

We introduce the task of multi-camera trajectory forecasting (MCTF), where future an object is predicted in a network cameras. Prior works consider trajectories single camera view. Our work first to challenging scenario across multiple non-overlapping views. This has wide applicability tasks such as re-identification and multi-target tracking. To facilitate research this new area, we release Warwick-NTU Multi-camera Forecasting Database (WNMF), unique dataset pedestrian from 15 synchronized...

10.1109/cvprw50498.2020.00516 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020-06-01

Self-Supervised Graphs for Audio Representation Learning With Limited Labeled Data

OPENALEX - Publications

Amir Shirian Krishna Somandepalli Tanaya Guha

Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning representations from highly limited labelled data. Considering each sample as node, we propose subgraph-based framework novel self-supervision tasks that can learn effective representations. During training, subgraphs constructed by sampling the entire pool of available training data exploit relationship between and unlabeled samples. inference,...

10.1109/jstsp.2022.3190083 article EN IEEE Journal of Selected Topics in Signal Processing 2022-07-14

On the role of head motion in affective expression

OPENALEX - Publications

Atanu Samanta Tanaya Guha

Non-verbal behavioral cues, such as head movement, play a significant role in human communication and affective expression. Although facial expression gestures have been extensively studied the context of emotion understanding, motion (which accompany both) is relatively less understood. This paper studies significance movement adult's affect using videos from movies. These are taken Acted Facial Expression Wild (AFEW) database labeled with seven basic categories: anger, disgust, fear, joy,...

10.1109/icassp.2017.7952684 article EN 2017-03-01

Coming Soon ...