- Anomaly Detection Techniques and Applications
- Data-Driven Disease Surveillance
- Adversarial Robustness in Machine Learning
- Global Maternal and Child Health
- Cell Image Analysis Techniques
- Aesthetic Perception and Analysis
- Cutaneous Melanoma Detection and Management
- Insurance, Mortality, Demography, Risk Management
- Machine Learning in Materials Science
- COVID-19 epidemiological studies
- Statistical Methods and Inference
- Bacillus and Francisella bacterial research
- Advanced Statistical Methods and Models
- Dermatology and Skin Diseases
- Network Security and Intrusion Detection
- Creativity in Education and Neuroscience
- Domain Adaptation and Few-Shot Learning
- Imbalanced Data Classification Techniques
- Child Nutrition and Water Access
- Explainable Artificial Intelligence (XAI)
- Machine Learning and Data Classification
- Data Stream Mining Techniques
- Digital Media Forensic Detection
- Zoonotic diseases and public health
- Advanced Causal Inference Techniques
IBM Research - Africa
2020-2023
IBM (United States)
2021
Carnegie Mellon University
2013-2018
Gaining insight into how deep convolutional neural network models perform image classification and to explain their outputs have been a concern computer vision researchers decision makers. These are often referred as black box due low comprehension of internal workings. As an effort developing explainable learning models, several methods proposed such finding gradients class output with respect input (sensitivity maps), activation map (CAM), Gradient based Class Activation Maps (Grad-CAM)....
We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. frame the pattern detection problem as search over subsets of records and attributes, maximizing nonparametric scan statistic all such subsets. prove that statistics possess novel property allows efficient optimization exponentially many without an exhaustive search, enabling FGSS to scale massive high-dimensional evaluate performance three real-world application...
Risk assessment is a growing use for machine learning models. When used in high-stakes applications, especially ones regulated by anti-discrimination laws or governed societal norms fairness, it important to ensure that learned models do not propagate and scale any biases may exist training data. In this paper, we add on an additional challenge beyond fairness: unsupervised domain adaptation covariate shift between source target distribution. Motivated the real-world problem of risk new...
Abstract Images depicting dark skin tones are significantly underrepresented in the educational materials used to teach primary care physicians and dermatologists recognize diseases. This could contribute disparities disease diagnosis across different racial groups. Previously, domain experts have manually assessed textbooks estimate diversity images. Manual assessment does not scale many introduces human errors. To automate this process, we present Skin Tone Analysis for Representation...
We present GraphScan, a novel method for detecting arbitrarily shaped connected clusters in graph or network data. Given structure, data observed at each node, and score function defining the anomalousness of set nodes, GraphScan can efficiently exactly identify most anomalous (highest-scoring) subgraph. Kulldorff's spatial scan, which searches over circles consisting center location its k − 1 nearest neighbors, has been extended to include connectivity constraints by FlexScan. However,...
Abstract This work provides three contributions that straddle the medical literature on multimorbidity and data science community with an interest exploratory analysis of health-related research data. First, we propose a definition for as co-occurrence (at least) two disease diagnoses from pre-determined list. interpretation adds to growing body working definitions emerging literature. Second, apply this novel outcome of-interest sub-Saharan populations located in Nairobi, Kenya Agincourt,...
We present the penalized fast subset scan (PFSS), a new and general framework for scalable accurate pattern detection. PFSS enables exact efficient identification of most anomalous subsets data, as measured by likelihood ratio statistic. However, also allows incorporation prior information about each data element's probability inclusion, which was not previously possible within framework. builds on two main results: first, we prove that large class statistics satisfy property additional,...
We explore scalable and accurate dynamic pattern detection methods in graph-based data sets. apply our proposed Dynamic Subset Scan method to the task of detecting, tracking, source-tracing contaminant plumes spreading through a water distribution system equipped with noisy, binary sensors. While static patterns affect same subset over period time, may different subsets at each time step. These require new approach define optimize penalized likelihood ratio statistics scan framework, as well...
Abstract The United Nations Sustainable Development Goals (SDGs) advocate for reducing preventable Maternal, Newborn, and Child Health (MNCH) deaths complications. However, many low- middle-income countries remain disproportionately affected by high rates of poor MNCH outcomes. Progress towards the 2030 sustainable development targets remains stagnated uneven within across countries, particularly in sub-Saharan Africa. current scenario is exacerbated a multitude factors, including COVID-19...
Recent advances in deep learning have led to breakthroughs the development of automated skin disease classification. As we observe an increasing interest these models dermatology space, it is crucial address aspects such as robustness towards input data distribution shifts. Current tend make incorrect inferences for test samples from different hardware devices and clinical settings or unknown samples, which are out-of-distribution (OOD) training samples. To this end, propose a simple yet...
Trustworthy artificial intelligence researchers should seek to better detect and characterize systematic deviations in data models (that is, bias). This article provides scientists with motivation, theory, code, examples on how perform disciplined discovery of at the subset level.
Reliably detecting attacks in a given set of inputs is high practical relevance because the vulnerability neural networks to adversarial examples. These altered create security risk applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for inner layers autoencoder (AE) by maximizing non-parametric measure anomalous node activations. Previous work this space has shown AE can detect images thresholding...
Audio processing models based on deep neural networks are susceptible to adversarial attacks even when the audio waveform is 99.9% similar a benign sample. Given wide application of DNN-based recognition systems, detecting presence examples high practical relevance. By applying anomalous pattern detection techniques in activation space these models, we show that 2 recent and current state-of-the-art systems systematically lead higher-than-expected at some subset nodes can detect with up an...
Abstract This study aimed at identifying the factors associated with neonatal mortality. We analyzed Demographic and Health Survey (DHS) datasets from 10 Sub-Saharan countries. For each survey, we trained machine learning models to identify women who had experienced a death within 5 years prior survey being administered. then inspected by visualizing features that were important for model, how, on average, changing values of affected risk confirmed known positive correlation between birth...
Mobile money platforms are gaining traction across developing markets as a convenient way of sending and receiving over mobile phones. Recent joint collaborations between banks mobile-network operators leverage customer's past phone transactions in order to create credit score for the individual. In this work, we address problem launching mobile-phone based scoring system new market without marginal distribution features borrowers market. This challenge rules out traditional transfer...
Deep generative models, such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have been employed widely in computational creativity research. However, models discourage out-of-distribution generation to avoid spurious sample generation, thereby limiting their creativity. Thus, incorporating research on human into deep learning techniques presents an opportunity make outputs more compelling human-like. As we see the emergence of directed toward research, a need...
This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that in order to detect when a network is processing an input. Detecting anomalies critical component for multiple machine learning problems including detecting adversarial noise. More broadly, this step towards giving the ability recognize out-of-distribution sample. first introduce "Subset Scanning" methods from domain task of input networks. Subset scanning treats problem...
Optical flow estimation have shown significant improvements with advances in deep neural networks. However, these networks recently been to be vulnerable patch-based adversarial attacks, which poses security risks real-world applications, such as self-driving cars and robotics. We propose SADL, a Spatially constrained Attack Detection Localization framework, detect localize attack without requiring dedicated training. The detection of an attacked input sequence is performed via iterative...
We propose an auditing method to identify whether a large language model (LLM) encodes patterns such as hallucinations in its internal states, which may propagate downstream tasks. introduce weakly supervised technique using subset scanning approach detect anomalous LLM activations from pre-trained models. Importantly, our does not need knowledge of the type a-priori. Instead, it relies on reference dataset devoid anomalies during testing. Further, enables identification pivotal nodes...
Legal financial obligations (LFOs) such as court fees and fines are commonly levied on individuals who convicted of crimes. It is expected that LFO amounts should be similar across social, racial, geographic subpopulations the same crime. This work analyzes distribution LFOs in Jefferson County, Alabama highlights disparities different individual neighborhood demographic characteristics. Data-driven discovery methods used to detect experience higher than overall population offenders....
Detecting bias in data is an integral component of trustworthy and responsible ML. For researchers scientists, investigating, detecting, becoming aware biases present important step to correcting making better ML decisions. Bias exists the form subsets that deviate from global expectations. Typically, begin with a set pre-defined protected/sensitive attributes use them as basis upon which deviation expectation examined. instance, researcher may examine under- or over-representation...