Emma Pierson

ORCID: 0000-0002-6149-5567
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Healthcare
  • Single-cell and spatial transcriptomics
  • Artificial Intelligence in Healthcare and Education
  • Gene expression and cancer classification
  • Inflammatory Biomarkers in Disease Prognosis
  • COVID-19 epidemiological studies
  • Cell Image Analysis Techniques
  • Human Mobility and Location-Based Analysis
  • Data-Driven Disease Surveillance
  • Crime Patterns and Interventions
  • Ethics and Social Impacts of AI
  • Anomaly Detection Techniques and Applications
  • demographic modeling and climate adaptation
  • Ethics in Clinical Research
  • Policing Practices and Perceptions
  • Topic Modeling
  • Cancer Immunotherapy and Biomarkers
  • Cancer, Lipids, and Metabolism
  • Machine Learning and Data Classification
  • Colorectal Cancer Screening and Detection
  • Mental Health Research Topics
  • Imbalanced Data Classification Techniques
  • Bioinformatics and Genomic Networks
  • Explainable Artificial Intelligence (XAI)
  • Advanced Causal Inference Techniques

Cornell University
2020-2025

University of California, Berkeley
2020-2025

Jacobs Institute
2023-2024

New York Proton Center
2024

Boston Children's Hospital
2024

Boston Medical Center
2024

Brigham and Women's Hospital
2024

Beth Israel Deaconess Medical Center
2024

Massachusetts General Hospital
2024

Harvard University
2024

Algorithms are now regularly used to decide whether defendants awaiting trial too dangerous be released back into the community. In some cases, black substantially more likely than white incorrectly classified as high risk. To mitigate such disparities, several techniques have recently been proposed achieve algorithmic fairness. Here we reformulate fairness constrained optimization: objective is maximize public safety while satisfying formal constraints designed reduce racial disparities. We...

10.1145/3097983.3098095 article EN 2017-08-04

Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction such high-dimensional sets is essential for visualization analysis, but single-cell are challenging classical dimensionality-reduction methods because prevalence dropout events, which lead to zero-inflated data. Here, we develop a method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA),...

10.1186/s13059-015-0805-z article EN cc-by Genome biology 2015-11-02

To understand the regulation of tissue-specific gene expression, GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This provides an opportunity deriving shared and tissue specific regulatory networks on basis co-expression between genes. However, a small number samples are available majority tissues, therefore statistical inference in this setting is highly underpowered. address problem, we infer 35 tissues dataset using novel algorithm, GNAT,...

10.1371/journal.pcbi.1004220 article EN cc-by PLoS Computational Biology 2015-05-13

The use of machine learning (ML) in healthcare raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline considerations for equitable ML the advancement healthcare. Specifically, frame ethics through lens social justice. We describe ongoing efforts and challenges a proposed pipeline health, ranging from problem selection to postdeployment considerations. close by summarizing recommendations address these challenges.

10.1146/annurev-biodatasci-092820-114757 article EN Annual Review of Biomedical Data Science 2021-05-06

We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in x-ray, would it still predict severe arthritis? State-of-the-art today do typically support manipulation of concepts like "the existence spurs", as they are trained end-to-end go directly from raw input (e.g., pixels) output arthritis severity). revisit classic idea first predicting provided at training time, and then these label. By construction, intervene on...

10.48550/arxiv.2007.04612 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Abstract A long-standing expectation is that large, dense and cosmopolitan areas support socioeconomic mixing exposure among diverse individuals 1–6 . Assessing this hypothesis has been difficult because previous measures of have relied on static residential housing data rather than real-life exposures people at work, in places leisure home neighbourhoods 7,8 Here we develop a measure segregation captures the diversity these everyday encounters. Using mobile phone mobility to represent 1.6...

10.1038/s41586-023-06757-3 article EN cc-by Nature 2023-11-29

Adjustment for race is discouraged in lung-function testing, but the implications of adopting race-neutral equations have not been comprehensively quantified.

10.1056/nejmsa2311809 article EN New England Journal of Medicine 2024-05-19

Importance Since 2013, the American College of Cardiology (ACC) and Heart Association (AHA) have recommended pooled cohort equations (PCEs) for estimating 10-year risk atherosclerotic cardiovascular disease (ASCVD). An AHA scientific advisory group recently developed Predicting Risk EVENTs (PREVENT) equations, which incorporated kidney measures, removed race as an input, improved calibration in contemporary populations. PREVENT is known to produce ASCVD predictions that are lower than those...

10.1001/jama.2024.12537 article EN JAMA 2024-07-29

Has the laudable intention of ensuring patient equity caused medicine to deviate from its mandate predict patients’ risk as accurately possible?

10.1056/nejmp2311050 article EN New England Journal of Medicine 2024-01-06

Abstract Purpose: Tumor-infiltrating lymphocytes (TIL) in pretreatment biopsies are associated with improved survival triple-negative breast cancer (TNBC). We investigated whether higher peripheral lymphocyte counts lower cancer–specific mortality (BCM) and overall (OM) TNBC. Experimental Design: Data on treatments diagnostic tests from electronic medical records of two health care systems were linked demographic, clinical, pathologic, data the California Cancer Registry. Multivariable...

10.1158/1078-0432.ccr-17-1323 article EN Clinical Cancer Research 2018-03-26

SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn sample-to-sample similarity measure from expression data observed for heterogenous samples, is presented here. can be effectively used perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations samples. was benchmarked against state-of-the-art methods these three on several public datasets, showing it scalable capable greatly...

10.1002/pmic.201700232 article EN PROTEOMICS 2017-12-19

Chromatin immune-precipitation sequencing (ChIP-seq) experiments are commonly used to obtain genome-wide profiles of histone modifications associated with different types functional genomic elements. However, the quality ChIP-seq data is affected by many experimental parameters such as amount input DNA, antibody specificity, ChIP enrichment and depth. Making accurate inferences from chromatin profiling that involve diverse challenging.We introduce a convolutional denoising algorithm, Coda,...

10.1093/bioinformatics/btx243 article EN cc-by-nc Bioinformatics 2017-04-18

Cycles are fundamental to human health and behavior. Examples include mood cycles, circadian rhythms, the menstrual cycle. However, modeling cycles in time series data is challenging because most cases not labeled or directly observed need be inferred from multidimensional measurements taken over time. Here, we present Cyclic Hidden Markov Models (CyHMMs) for detecting a collection of heterogeneous data. In contrast previous cycle methods, CyHMMs deal with number challenges encountered...

10.1145/3178876.3186052 article EN 2018-01-01

Distribution shifts -- where the training distribution differs from test can substantially degrade accuracy of machine learning (ML) systems deployed in wild. Despite their ubiquity real-world deployments, these are under-represented datasets widely used ML community today. To address this gap, we present WILDS, a curated benchmark 10 reflecting diverse range that naturally arise applications, such as across hospitals for tumor identification; camera traps wildlife monitoring; and time...

10.48550/arxiv.2012.07421 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array public services. Alongside these opportunities is risk that reify existing power imbalances cause disproportionate harm marginalized communities. Participatory approaches hold promise instead lend agency decision-making stakeholders. But participatory AI/ML are typically deeply grounded context - how do we apply models, which are, by design, disconnected from context?...

10.1145/3630106.3658992 preprint EN cc-by 2022 ACM Conference on Fairness, Accountability, and Transparency 2024-06-03
Coming Soon ...