Ahmed Imtiaz Humayun

ORCID: 0000-0002-9530-1134
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Generative Adversarial Networks and Image Synthesis
  • Handwritten Text Recognition Techniques
  • Speech and Audio Processing
  • Digital Media Forensic Detection
  • Speech Recognition and Synthesis
  • Blind Source Separation Techniques
  • Natural Language Processing Techniques
  • Image Processing and 3D Reconstruction
  • Multimodal Machine Learning Applications
  • Advanced Data Compression Techniques
  • Anomaly Detection Techniques and Applications
  • Neural Networks and Applications
  • AI in cancer detection
  • EEG and Brain-Computer Interfaces
  • Time Series Analysis and Forecasting
  • Phonocardiography and Auscultation Techniques
  • Cardiac Arrhythmias and Treatments
  • Sleep and Wakefulness Research
  • Vehicle License Plate Recognition
  • Topic Modeling
  • ECG Monitoring and Analysis
  • Medical Imaging Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Advanced Clustering Algorithms Research

Rice University
2020-2023

Bangladesh University of Engineering and Technology
2018-2020

Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate success in controlled scenarios, robustness real-world applications and readiness deployment remain uncertain. In this work, we identify a critical gap evaluating sanitized models, particularly terms of performance across various concept dimensions. We systematically investigate the failure modes current...

10.48550/arxiv.2501.09833 preprint EN arXiv (Cornell University) 2025-01-16

10.1090/noti3150 article EN Notices of the American Mathematical Society 2025-03-13

Humans approximately spend a third of their life sleeping, which makes monitoring sleep an integral part well-being. In this paper, 34-layer deep residual ConvNet architecture for end-to-end staging is proposed. The network takes raw single channel electroencephalogram (Fpz-Cz) signal as input and yields hypnogram annotations each 30s segments output. Experiments are carried out two different scoring standards (5 6 stage classification) on the expanded PhysioNet Sleep-EDF dataset, contains...

10.1109/bhi.2019.8834483 article EN 2019-05-01

To benchmark Bengali digit recognition algorithms, a large publicly available dataset is required which free from biases originating geographical location, gender, and age. With this aim in mind, NumtaDB, consisting of more than 85,000 images hand-written digits, has been assembled. This paper documents the collection curation process numerals along with salient statistics dataset.

10.48550/arxiv.1806.02452 preprint EN cc-by arXiv (Cornell University) 2018-01-01

Cardiac auscultation is the most practiced non-invasive and cost-effective procedure for early diagnosis of heart diseases. While machine learning based systems can aid in automatically screening patients, robustness these affected by numerous factors including stethoscope/sensor, environment, data collection protocol. This paper studies adverse effect domain variability on sound abnormality detection develops strategies to address this problem. Methods: We propose a novel Convolutional...

10.1109/jbhi.2020.2970252 article EN IEEE Journal of Biomedical and Health Informatics 2020-01-31

We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of any pre-trained deep generative network (DGN). Leveraging fact that DGNs are, or can be ap-proximated by, continuous piecewise affine splines, we derive analytical DGN output space distribution as function product DGN's Jacobian singular values raised to power p. dub p polarity param-eter prove focuses sampling on modes (p < 0) anti-modes > output-space...

10.1109/cvpr52688.2022.01038 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this work, we propose an ensemble of classifiers to distinguish between various degrees abnormalities the heart using Phonocardiogram (PCG) signals acquired digital stethoscopes in a clinical setting, for INTERSPEECH 2018 Computational Paralinguistics (ComParE) Heart Beats SubChallenge. Our primary classification framework constitutes convolutional neural network with 1D-CNN time-convolution (tConv) layers, which uses features transferred from model trained on 2016 Physionet Sound...

10.21437/interspeech.2018-2413 article EN Interspeech 2022 2018-08-28

In the advent of a digital health revolution, vast amounts clinical data are being generated, stored and processed on daily basis. This has made storage retrieval large volumes health-care data, especially, high-resolution medical images, particularly challenging. Effective image compression for images thus plays vital role in todays healthcare information system, teleradiology. this work, an X-ray method based Convolutional Recurrent Neural Networks (RNN-Conv) is presented. The proposed...

10.1109/bhi.2019.8834656 article EN 2019-05-01

Bengali is one of the most spoken languages in world with over 300 million speakers globally. Despite its popularity, research into development speech recognition systems hindered due to lack diverse open-source datasets. As a way forward, we have crowdsourced Common Voice Speech Dataset, which sentence-level automatic corpus. Collected on Mozilla platform, dataset part an ongoing campaign that has led collection 400 hours data 2 months and growing rapidly. Our analysis shows this more...

10.48550/arxiv.2206.14053 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

Automatic heart sound abnormality detection can play a vital role in the early diagnosis of diseases, particularly low-resource settings. The state-of-the-art algorithms for this task utilize set Finite Impulse Response (FIR) band-pass filters as front-end followed by Convolutional Neural Network (CNN) model. In work, we propound novel CNN architecture that integrates within network using time-convolution (tConv) layers, which enables FIR filter-bank parameters to become learnable. Different...

10.1109/embc.2018.8512578 preprint EN 2018-07-01

Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation use synthetic train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical empirical analysis using state-of-the-art image models of three families loops that differ how fixed or fresh real training is available through generations whether samples from previous generation...

10.48550/arxiv.2307.01850 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Current Deep Network (DN) visualization and inter-pretability methods rely heavily on data space visualizations such as scoring which dimensions of the are responsible for their associated prediction or generating new features samples that best match a given DN unit representation. In this paper, we go one step further by developing first provably exact method computing geometry DN's mapping - including its decision boundary over specified region space. By lever-aging theory Continuous...

10.1109/cvpr52729.2023.00369 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Beat tracking from music signals has significant importance in multimedia information retrieval systems, especially cover song detection. A predictive real-time beat system can also be used to assist musicians performing live. In this paper we present a algorithm, fast enough implemented on an embedded system. The onset of note is detected using maximum filter approach that suppresses the effect vibrato. Beats are predicted second advance causal variant Dynamic Programming. We have employed...

10.1109/mipr.2018.00068 article EN 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) 2018-04-01

This paper proposes two libraries to address common and uncommon issues with Unicode-based writing schemes for Indic languages. The first is a normalizer that corrects inconsistencies caused by the encoding scheme https://pypi.org/project/bnunicodenormalizer/ . second grapheme parser Abugida text https://pypi.org/project/indicparser/ Both tools are more efficient effective than previously used tools. We report 400% increase in speed ensure significantly better performance different language...

10.48550/arxiv.2306.01743 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Deep Generative Networks (DGNs) are extensively employed in Adversarial (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold distribution. However, training samples often distributed a non-uniform fashion on manifold, due costs or convenience of collection. For example, CelebA dataset contains large fraction smiling faces. These inconsistencies will be reproduced when sampling from trained DGN, which is not always preferred, e.g., for fairness...

10.48550/arxiv.2110.08009 preprint EN cc-by-nc-sa arXiv (Cornell University) 2021-01-01

We develop Scalable Latent Exploration Score (ScaLES) to mitigate over-exploration in Space Optimization (LSO), a popular method for solving black-box discrete optimization problems. LSO utilizes continuous within the latent space of Variational Autoencoder (VAE) and is known be susceptible over-exploration, which manifests unrealistic solutions that reduce its practicality. ScaLES an exact theoretically motivated leveraging trained decoder's approximation data distribution. can calculated...

10.48550/arxiv.2406.09657 preprint EN arXiv (Cornell University) 2024-06-13

The artificial intelligence (AI) world is running out of real data for training increasingly large generative models, resulting in accelerating pressure to train on synthetic data. Unfortunately, new models with from current or past generation creates an autophagous (self-consuming) loop that degrades the quality and/or diversity what has been termed model autophagy disorder (MAD) and collapse. Current thinking around recommends be avoided lest system deteriorate into MADness. In this paper,...

10.48550/arxiv.2408.16333 preprint EN arXiv (Cornell University) 2024-08-29

Grokking, or delayed generalization, is a phenomenon where generalization in deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking specific controlled settings, such as DNNs initialized with large-norm parameters transformers trained on algorithmic datasets. We demonstrate that actually much more widespread and materializes wide range practical convolutional (CNN) CIFAR10 Resnet Imagenette. introduce new...

10.48550/arxiv.2402.15555 preprint EN arXiv (Cornell University) 2024-02-23

Implicit neural representations (INRs) have demonstrated success in a variety of applications, including inverse problems and rendering. An INR is typically trained to capture one signal interest, resulting learned features that are highly attuned signal. Assumed be less generalizable, we explore the aspect transferability such for fitting similar signals. We introduce new training framework, STRAINER learns transferrable INRs signals from given distribution, faster with better...

10.48550/arxiv.2409.09566 preprint EN arXiv (Cornell University) 2024-09-14

Deep generative models learn continuous representations of complex data manifolds using a finite number samples during training. For pre-trained model, the common way to evaluate quality manifold representation learned, is by computing global metrics like Fr\'echet Inception Distance large generated and real samples. However, model performance not uniform across learned manifold, e.g., for \textit{foundation models} Stable Diffusion generation can vary significantly based on conditioning or...

10.48550/arxiv.2408.08307 preprint EN arXiv (Cornell University) 2024-08-15
Coming Soon ...