- Machine Learning in Healthcare
- Single-cell and spatial transcriptomics
- Artificial Intelligence in Healthcare and Education
- Gene expression and cancer classification
- Inflammatory Biomarkers in Disease Prognosis
- COVID-19 epidemiological studies
- Cell Image Analysis Techniques
- Human Mobility and Location-Based Analysis
- Data-Driven Disease Surveillance
- Crime Patterns and Interventions
- Ethics and Social Impacts of AI
- Anomaly Detection Techniques and Applications
- demographic modeling and climate adaptation
- Ethics in Clinical Research
- Policing Practices and Perceptions
- Topic Modeling
- Cancer Immunotherapy and Biomarkers
- Cancer, Lipids, and Metabolism
- Machine Learning and Data Classification
- Colorectal Cancer Screening and Detection
- Mental Health Research Topics
- Imbalanced Data Classification Techniques
- Bioinformatics and Genomic Networks
- Explainable Artificial Intelligence (XAI)
- Advanced Causal Inference Techniques
Cornell University
2020-2025
University of California, Berkeley
2020-2025
Jacobs Institute
2023-2024
New York Proton Center
2024
Boston Children's Hospital
2024
Boston Medical Center
2024
Brigham and Women's Hospital
2024
Beth Israel Deaconess Medical Center
2024
Massachusetts General Hospital
2024
Harvard University
2024
Algorithms are now regularly used to decide whether defendants awaiting trial too dangerous be released back into the community. In some cases, black substantially more likely than white incorrectly classified as high risk. To mitigate such disparities, several techniques have recently been proposed achieve algorithmic fairness. Here we reformulate fairness constrained optimization: objective is maximize public safety while satisfying formal constraints designed reduce racial disparities. We...
Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction such high-dimensional sets is essential for visualization analysis, but single-cell are challenging classical dimensionality-reduction methods because prevalence dropout events, which lead to zero-inflated data. Here, we develop a method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA),...
To understand the regulation of tissue-specific gene expression, GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This provides an opportunity deriving shared and tissue specific regulatory networks on basis co-expression between genes. However, a small number samples are available majority tissues, therefore statistical inference in this setting is highly underpowered. address problem, we infer 35 tissues dataset using novel algorithm, GNAT,...
The use of machine learning (ML) in healthcare raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline considerations for equitable ML the advancement healthcare. Specifically, frame ethics through lens social justice. We describe ongoing efforts and challenges a proposed pipeline health, ranging from problem selection to postdeployment considerations. close by summarizing recommendations address these challenges.
We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in x-ray, would it still predict severe arthritis? State-of-the-art today do typically support manipulation of concepts like "the existence spurs", as they are trained end-to-end go directly from raw input (e.g., pixels) output arthritis severity). revisit classic idea first predicting provided at training time, and then these label. By construction, intervene on...
Abstract A long-standing expectation is that large, dense and cosmopolitan areas support socioeconomic mixing exposure among diverse individuals 1–6 . Assessing this hypothesis has been difficult because previous measures of have relied on static residential housing data rather than real-life exposures people at work, in places leisure home neighbourhoods 7,8 Here we develop a measure segregation captures the diversity these everyday encounters. Using mobile phone mobility to represent 1.6...
Adjustment for race is discouraged in lung-function testing, but the implications of adopting race-neutral equations have not been comprehensively quantified.
Importance Since 2013, the American College of Cardiology (ACC) and Heart Association (AHA) have recommended pooled cohort equations (PCEs) for estimating 10-year risk atherosclerotic cardiovascular disease (ASCVD). An AHA scientific advisory group recently developed Predicting Risk EVENTs (PREVENT) equations, which incorporated kidney measures, removed race as an input, improved calibration in contemporary populations. PREVENT is known to produce ASCVD predictions that are lower than those...
Has the laudable intention of ensuring patient equity caused medicine to deviate from its mandate predict patients’ risk as accurately possible?
Abstract Purpose: Tumor-infiltrating lymphocytes (TIL) in pretreatment biopsies are associated with improved survival triple-negative breast cancer (TNBC). We investigated whether higher peripheral lymphocyte counts lower cancer–specific mortality (BCM) and overall (OM) TNBC. Experimental Design: Data on treatments diagnostic tests from electronic medical records of two health care systems were linked demographic, clinical, pathologic, data the California Cancer Registry. Multivariable...
SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn sample-to-sample similarity measure from expression data observed for heterogenous samples, is presented here. can be effectively used perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations samples. was benchmarked against state-of-the-art methods these three on several public datasets, showing it scalable capable greatly...
Chromatin immune-precipitation sequencing (ChIP-seq) experiments are commonly used to obtain genome-wide profiles of histone modifications associated with different types functional genomic elements. However, the quality ChIP-seq data is affected by many experimental parameters such as amount input DNA, antibody specificity, ChIP enrichment and depth. Making accurate inferences from chromatin profiling that involve diverse challenging.We introduce a convolutional denoising algorithm, Coda,...
Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, the menstrual cycle. However, modeling cycles in time series data is challenging because most cases not labeled or directly observed need be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyHMMs) for detecting a collection of heterogeneous data. In contrast previous cycle methods, CyHMMs deal with number challenges encountered...
Distribution shifts -- where the training distribution differs from test can substantially degrade accuracy of machine learning (ML) systems deployed in wild. Despite their ubiquity real-world deployments, these are under-represented datasets widely used ML community today. To address this gap, we present WILDS, a curated benchmark 10 reflecting diverse range that naturally arise applications, such as across hospitals for tumor identification; camera traps wildlife monitoring; and time...
Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array public services. Alongside these opportunities is risk that reify existing power imbalances cause disproportionate harm marginalized communities. Participatory approaches hold promise instead lend agency decision-making stakeholders. But participatory AI/ML are typically deeply grounded context - how do we apply models, which are, by design, disconnected from context?...