Emily Alsentzer

ORCID: 0000-0002-5370-1746
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Healthcare
  • Topic Modeling
  • Artificial Intelligence in Healthcare and Education
  • Biomedical Text Mining and Ontologies
  • Genomics and Rare Diseases
  • Natural Language Processing Techniques
  • Trauma and Emergency Care Studies
  • Artificial Intelligence in Healthcare
  • Clinical Reasoning and Diagnostic Skills
  • Data-Driven Disease Surveillance
  • Injury Epidemiology and Prevention
  • Cancer Genomics and Diagnostics
  • Health, Environment, Cognitive Aging
  • Radiomics and Machine Learning in Medical Imaging
  • Maternal and fetal healthcare
  • Advanced Graph Neural Networks
  • Vaccine Coverage and Hesitancy
  • Autopsy Techniques and Outcomes
  • Emergency and Acute Care Studies
  • Text Readability and Simplification
  • Complex Network Analysis Techniques
  • Meta-analysis and systematic reviews
  • Healthcare Technology and Patient Monitoring
  • Epilepsy research and treatment
  • Advanced Statistical Process Monitoring

Brigham and Women's Hospital
2023-2025

Stanford University
2015-2025

Harvard University
2020-2025

Massachusetts Institute of Technology
2018-2022

Harvard University Press
2022

Harvard–MIT Division of Health Sciences and Technology
2021-2022

Uniformed Services University of the Health Sciences
2020

Johns Hopkins University
2020

United States Naval Medical Research Unit SOUTH
2020

Vanderbilt University Medical Center
2014

Contextual word embedding models such as ELMo and BERT have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these been minimally explored on specialty corpora, clinical text; moreover, the domain, no publicly-available pre-trained yet exist. In this work, we address need by exploring releasing text: one generic text another discharge summaries specifically. We demonstrate that using a domain-specific model yields improvements 3/5...

10.18653/v1/w19-1909 article EN 2019-01-01

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these been minimally explored on specialty corpora, clinical text; moreover, the domain, no publicly-available pre-trained yet exist. In this work, we address need by exploring releasing text: one generic text another discharge summaries specifically. We demonstrate that using a domain-specific...

10.48550/arxiv.1904.03323 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Large language models (LLMs) such as GPT-4 hold great promise transformative tools in health care, ranging from automating administrative tasks to augmenting clinical decision making. However, these also pose a danger of perpetuating biases and delivering incorrect medical diagnoses, which can have direct, harmful impact on care. We aimed assess whether encodes racial gender that its use

10.1016/s2589-7500(23)00225-x article EN cc-by The Lancet Digital Health 2023-12-18

Prior research has shown that artificial intelligence (AI) systems often encode biases against minority subgroups. However, little work focused on ways to mitigate the harm discriminatory algorithms can cause in high-stakes settings such as medicine.In this study, we experimentally evaluated impact biased AI recommendations have emergency decisions, where participants respond mental health crises by calling for either medical or police assistance. We recruited 438 clinicians and 516...

10.1038/s43856-022-00214-4 article EN cc-by Communications Medicine 2022-11-21

Although recent advances in scaling large language models (LLMs) have resulted improvements on many NLP tasks, it remains unclear whether these trained primarily with general web text are the right tool highly specialized, safety critical domains such as clinical text. Recent results suggested that LLMs encode a surprising amount of medical knowledge. This raises an important question regarding utility smaller domain-specific models. With success general-domain LLMs, is there still need for...

10.48550/arxiv.2302.08091 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Abstract Many areas of medicine would benefit from deeper, more accurate phenotyping, but there are limited approaches for phenotyping using clinical notes without substantial annotated data. Large language models (LLMs) have demonstrated immense potential to adapt novel tasks with no additional training by specifying task-specific instructions. Here we report the performance a publicly available LLM, Flan-T5, in patients postpartum hemorrhage (PPH) discharge electronic health records ( n =...

10.1038/s41746-023-00957-x article EN cc-by npj Digital Medicine 2023-11-30

Abstract Background Large language models (LLMs) such as GPT-4 hold great promise transformative tools in healthcare, ranging from automating administrative tasks to augmenting clinical decision- making. However, these also pose a serious danger of perpetuating biases and delivering incorrect medical diagnoses, which can have direct, harmful impact on care. Methods Using the Azure OpenAI API, we tested whether encodes racial gender examined four potential applications LLMs domain—namely,...

10.1101/2023.07.13.23292577 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2023-07-16

Abstract Objectives Large language models (LLMs) are poised to change care delivery, but their impact on health equity is unclear. While marginalized populations have been historically excluded from early technology developments, LLMs present an opportunity our approach developing, evaluating, and implementing new technologies. In this perspective, we describe the role of in supporting equity. Materials Methods We apply National Institute Minority Health Disparities (NIMHD) research...

10.1093/jamia/ocae055 article EN Journal of the American Medical Informatics Association 2024-03-20

Medical licensing examinations, such as the United States Licensing Examination, have become default benchmarks for evaluating large language models (LLMs) in health care. Performance on these is frequently cited evidence of progress and used to justify deployment LLMs into clinical settings. However, we argue that are fundamentally limited signals assessing true utility.

10.1056/aie2401235 article EN NEJM AI 2025-01-23

Abstract Understanding reasons for treatment switching is of significant medical interest, but these factors are often only found in unstructured clinical notes and can be difficult to extract. We evaluated the zero-shot abilities GPT-4 eight other open-source large language models (LLMs) extract contraceptive information from 1964 derived UCSF Information Commons dataset. extracted contraceptives started stopped at each switch with microF1 scores 0.85 0.88, respectively, compared 0.81 0.88...

10.1038/s41746-025-01615-0 article EN cc-by npj Digital Medicine 2025-04-23

Deep learning methods for graphs achieve remarkable performance on many node-level and graph-level prediction tasks. However, despite the proliferation of their success, prevailing Graph Neural Networks (GNNs) neglect subgraphs, rendering subgraph tasks challenging to tackle in impactful applications. Further, present several unique challenges: subgraphs can have non-trivial internal topology, but also carry a notion position external connectivity information relative underlying graph which...

10.48550/arxiv.2006.10538 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Griffin Adams, Emily Alsentzer, Mert Ketenci, Jason Zucker, Noémie Elhadad. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.

10.18653/v1/2021.naacl-main.382 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

The evaluation and management of first-time seizure-like events in children can be difficult because these episodes are not always directly observed might epileptic seizures or other conditions (seizure mimics). We aimed to evaluate whether machine learning models using real-world data could predict seizure recurrence after an initial event. This retrospective cohort study compared trained evaluated on two separate datasets between Jan 1, 2010, 2020: electronic medical records (EMRs) at...

10.1016/s2589-7500(23)00179-6 article EN cc-by The Lancet Digital Health 2023-11-22

Information overload in electronic health records (EHRs) hampers clinicians' ability to efficiently extract and synthesize critical information from a patient's longitudinal record, leading increased cognitive burden delays care. This study explores the potential of large language models (LLMs) address this challenge by generating problem-based admission summaries for patients admitted with heart failure, cause hospitalization worldwide. We developed an extract-then-abstract approach guided...

10.1101/2025.06.02.25328807 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-06-03

Patient summarization is essential for clinicians to provide coordinated care and practice effective communication. Automated has the potential save time, standardize notes, aid clinical decision making, reduce medical errors. Here we an upper bound on extractive of discharge notes develop LSTM model sequentially label topics history present illness notes. We achieve F1 score 0.876, which indicates that this can be employed create a dataset evaluation methods.

10.48550/arxiv.1810.12085 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women the United States using contraceptives, understanding factors that drive selection and switching is of significant interest. However, many related to medication are often only captured unstructured clinical notes can be difficult extract. Here, we evaluate zero-shot abilities recently developed large language model, GPT-4 (via HIPAA-compliant Microsoft Azure API), identify...

10.48550/arxiv.2402.03597 preprint EN arXiv (Cornell University) 2024-02-05

Abstract Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300–400 million patients worldwide. Many automated tools aim to uncover causal genes in with suspected genetic disorders, but evaluation of these is limited due the lack comprehensive benchmark datasets that include previously unpublished conditions. Here, we present computational pipeline simulates realistic clinical address this deficit. Our framework jointly complex phenotypes challenging candidate...

10.1038/s41467-023-41980-6 article EN cc-by Nature Communications 2023-10-12

Abstract There are more than 7,000 rare diseases, some affecting 3,500 or fewer patients in the US. Due to clinicians’ limited experience with such diseases and heterogeneity of clinical presentations, approximately 70% individuals seeking a diagnosis today remain undiagnosed. Deep learning has demonstrated success aiding common diseases. However, existing approaches require labeled datasets thousands diagnosed per disease. Here, we present SHEPHERD, few shot approach for multi-faceted...

10.1101/2022.12.07.22283238 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2022-12-13

Many areas of medicine would benefit from deeper, more accurate phenotyping, but there are limited approaches for phenotyping using clinical notes without substantial annotated data. Large language models (LLMs) have demonstrated immense potential to adapt novel tasks with no additional training by specifying task-specific i nstructions. We investigated the per-formance a publicly available LLM, Flan-T5, in patients postpartum hemorrhage (PPH) discharge electronic health records ( n...

10.1101/2023.05.31.23290753 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2023-06-01
Coming Soon ...