Gautam Bhattacharya

ORCID: 0000-0003-4787-0604
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and Audio Processing
  • Speech Recognition and Synthesis
  • Music and Audio Processing
  • Face and Expression Recognition
  • Solar and Space Plasma Dynamics
  • Cosmology and Gravitation Theories
  • Natural Language Processing Techniques
  • Black Holes and Theoretical Physics
  • Music Technology and Sound Studies
  • Solar Radiation and Photovoltaics
  • Galaxies: Formation, Evolution, Phenomena
  • Speech and dialogue systems
  • Machine Learning and Data Classification
  • Advanced Statistical Methods and Models
  • Consumer Market Behavior and Pricing
  • Anomaly Detection Techniques and Applications
  • Geophysics and Gravity Measurements
  • Computational Physics and Python Applications
  • Innovation Policy and R&D
  • Chaos, Complexity, and Education
  • Consumer Retail Behavior Studies
  • Chaos-based Image/Signal Encryption
  • Advanced Malware Detection Techniques
  • Intellectual Property and Patents
  • Vehicle License Plate Recognition

Dolby (Netherlands)
2023-2025

McGill University
2014-2019

Computer Research Institute of Montréal
2016-2019

University of Burdwan
2009-2017

Saha Institute of Nuclear Physics
1989-2009

University of Kansas
1985-2004

The automatic speaker verification spoofing and countermeasures challenge 2015 provides a common framework for the evaluation of or anti-spoofing techniques in presence various seen unseen attacks. This contribution proposes system consisting amplitude, phase, linear prediction residual, combined amplitude - phase-based detection In this task we use following features: Mel-frequency cepstral coefficients (MFCC), product spectrum-based coefficients, modified group delay weighted residual...

10.21437/interspeech.2015-469 article EN Interspeech 2022 2015-09-06

This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse domain discriminator so that it cannot tell if are from the source or target domains. We train several GAN variants our proposed framework and apply them verification task. On challenging NIST-SRE 2016 dataset, we able match performance of strong baseline x-vector system. In contrast systems which dependent on dimensionality reduction...

10.1109/icassp.2019.8682064 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing variety impulsive, tonal, and environmental sounds. Such operate on band-limited signals and, as result an autoregressive approach, they are typically conformed by pre-trained latent encoders and/or several cascaded modules. In this work, we propose diffusion-based model for synthesis, named DAG, which deals with full-band end-to-end in waveform domain. Results...

10.1109/icassp49357.2023.10096760 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

10.1109/icassp49660.2025.10889934 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

We propose to improve the performance of i-vector based speaker verification by processing i-vectors with a deep neural network before they are fed cosine distance or probabilistic linear discriminant analysis (PLDA) classifier. To this end we build on an existing model that refer as Non-linear Within Class Normalization (NWCN) and introduce novel Speaker Classifier Network (SCN). Both models deliver impressive performance, showing 56% 68% relative improvement over standard when combined...

10.1109/slt.2016.7846264 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2016-12-01

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our the task text-independent verification, challenging, real-world problem in biometric security. further development end-to-end embedding models by combing 1-dimensional, self-attentive residual network, an angular margin loss function and strategy. Our model is able learn extremely compact, 64-dimensional that deliver competitive performance...

10.1109/icassp.2019.8682611 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019-04-17

Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach unlabeled videos and pre-trained language-vision models. We propose learn the desired correspondence by leveraging visual modality as a bridge. train conditional diffusion model generate track video, given video frame encoded pretrained contrastive language-image pretraining (CLIP)...

10.1109/waspaa58266.2023.10248160 article EN 2023-09-15

Accuracy of the well-known k-nearest neighbor (kNN) classifier heavily depends on choice k. The problem estimating a suitable k for any test point becomes difficult due to several factors like local distribution training points around that point, presence outliers in dataset, and, dimensionality feature space. In this paper, we propose dynamic estimation algorithm based density function and class variance as well certainty factor information points. Performance kNN with proposed is evaluated...

10.1109/icapr.2015.7050683 article EN 2015-01-01

Accuracy of the well-known kNN classifier depends significantly on suitable choice k. In this paper, we propose an improved algorithm with a novel non-parametric test point specific k estimation strategy. To estimate for any point, first construct hypersphere around it to capture local distribution surrounding training points. Class hubness information is then used as weight hypervolume above hyper sphere. Experiments several UCI benchmark datasets clearly demonstrate supremacy our over...

10.1109/icpr.2014.263 article EN 2014-08-01

Matching Pursuit (MP) is a greedy algorithm that iteratively builds sparse signal representation. This work presents an analysis of MP in the context audio denoising. By interpreting as simple shrinkage approach, we identify factors critical to its success, and propose several approaches improve performance robustness. We present experimental results on wide range signals, show method able yield thats are competitive with other denosing approaches. Notably, proposed approach retains small...

10.1109/icassp.2014.6854130 article EN 2014-05-01

This is the publisher's version, also available electronically from https://editorialexpress.com/cgi-bin/rje_online.cgi?action=view&year=1984&issue=sum&page=281&&tid=120733&sc=7oKBHr2G.

10.2307/2555681 article EN The RAND Journal of Economics 1984-01-01
Coming Soon ...