Scott L. Fleming

ORCID: 0000-0002-6047-7877
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Healthcare
  • Artificial Intelligence in Healthcare and Education
  • Functional Brain Connectivity Studies
  • Electronic Health Records Systems
  • Topic Modeling
  • Mental Health Research Topics
  • Sepsis Diagnosis and Treatment
  • Biomedical Text Mining and Ontologies
  • Treatment of Major Depression
  • Healthcare cost, quality, practices
  • Clinical Reasoning and Diagnostic Skills
  • Mental Health via Writing
  • Child Development and Digital Technology
  • Digital Mental Health Interventions
  • Healthcare Policy and Management
  • Advanced Neuroimaging Techniques and Applications
  • Advanced MRI Techniques and Applications
  • Autism Spectrum Disorder Research
  • Palliative Care and End-of-Life Issues
  • Transcranial Magnetic Stimulation Studies
  • Advanced Text Analysis Techniques
  • Medical Coding and Health Information
  • Educational Technology and Assessment
  • Neurobiology of Language and Bilingualism
  • Forecasting Techniques and Applications

Stanford University
2019-2025

Mayo Clinic in Arizona
2023

Stanford Health Care
2023

Stanford Medicine
2023

Carnegie Mellon University
2020

Erasmus University Rotterdam
2020

The ability of large language models (LLMs) to follow natural instructions with human-level fluency suggests many opportunities in healthcare reduce administrative burden and improve quality care. However, evaluating LLMs on realistic text generation tasks for remains challenging. Existing question answering datasets electronic health record (EHR) data fail capture the complexity information needs documentation burdens experienced by clinicians. To address these challenges, we introduce...

10.1609/aaai.v38i20.30205 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance that of musculoskeletal radiologists.Radiographs the Osteoarthritis Initiative staged by a radiologist committee using Kellgren-Lawrence (KL) system were used. Before images as input convolutional neural network model, they standardized augmented automatically. The was trained with 32 116 images, tuned 4074 evaluated 4090-image test set, compared two individual radiologists...

10.1148/ryai.2020190065 article EN Radiology Artificial Intelligence 2020-03-01

Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers accelerate process by collecting home videos US-based children, identifying a reduced subset behavioral features are scored untrained raters classifier determine children's "risk scores" for autism. We...

10.2196/13822 article EN cc-by Journal of Medical Internet Research 2019-04-24

BackgroundDespite tremendous advances in characterizing human neural circuits that govern emotional and cognitive functions impaired depression anxiety, we lack a circuit-based taxonomy for anxiety captures transdiagnostic heterogeneity informs clinical decision making.MethodsWe developed tested novel system quantifying 6 brain reproducibly at the individual patient level. We implemented standardized circuit definitions relative to healthy reference sample algorithms generate scores overall...

10.1016/j.biopsych.2021.06.024 article EN cc-by-nc-nd Biological Psychiatry 2021-07-11

Abstract In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. order of an event relative a time index) can inform many important analyses. However, creating training data for entity tasks is consuming sharing labeled challenging due privacy concerns. The information needs COVID-19 pandemic highlight need agile methods machine learning models notes. We present Trove, framework weakly supervised classification medical...

10.1038/s41467-021-22328-4 article EN cc-by Nature Communications 2021-04-01

Abstract Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve robustness task-specific models. The objective was to evaluate utility EHR improving in-distribution (ID) and out-of-distribution (OOD) Transformer- gated recurrent unit-based were pretrained up 1.8 M patients (382...

10.1038/s41598-023-30820-8 article EN cc-by Scientific Reports 2023-03-07

There are a number of available methods for selecting whom to prioritize treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average (RATE) metrics as simple general family comparing testing the quality prioritization RATE agnostic how rules were derived, only assess well they identify individuals that benefit most from treatment. define estimators prove central limit theorem enables asymptotically exact inference in...

10.1080/01621459.2024.2393466 article EN Journal of the American Statistical Association 2024-09-03

Abstract Background Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels tasks, and identify cohorts multicenter network studies. However, the coverage rates of diagnostic their variability across institutions underexplored. The primary objective was describe lab- diagnosis-based 7 selected outcomes at three institutions. Secondary objectives were agreement, sensitivity, specificity against lab-based labels. Methods This study included cohorts:...

10.1186/s12911-024-02449-8 article EN cc-by BMC Medical Informatics and Decision Making 2024-02-14

Abstract Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy mental health settings has not been well described. It unclear which metrics thresholds are appropriate for different clinical use cases, may range from population descriptions to individual Here we show that feasible psychotherapy, but further improvements...

10.1038/s41746-020-0285-8 article EN cc-by npj Digital Medicine 2020-06-03

0. Abstract Background The integration of large language models (LLMs) in healthcare offers immense opportunity to streamline tasks, but also carries risks such as response accuracy and bias perpetration. To address this, we conducted a red-teaming exercise assess LLMs developed dataset clinically relevant scenarios for future teams use. Methods We convened 80 multi-disciplinary experts evaluate the performance popular across multiple medical scenarios. Teams composed clinicians, engineering...

10.1101/2024.04.05.24305411 preprint EN cc-by-nc-nd medRxiv (Cold Spring Harbor Laboratory) 2024-04-07

Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. structured electronic health records (EHR), trained on coded medical from millions of patients, demonstrated benefits including increased performance with fewer training labels, improved robustness to distribution shifts. However, questions remain the feasibility sharing these across...

10.1038/s41746-024-01166-w article EN cc-by npj Digital Medicine 2024-06-27

Abstract Countless studies have advanced our understanding of the human brain and its organization by using functional magnetic resonance imaging (fMRI) to derive network representations function. However, we do not know what extent these “functional connectomes” are reliable over time. In a large public sample healthy participants (N = 833) scanned on two consecutive days, assessed test-retest reliability fMRI connectivity consequences three common sources variation in analysis workflows:...

10.1162/netn_a_00148 article EN cc-by Network Neuroscience 2020-05-20

Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy large language models, but non-model creator-affiliated red teaming scant in healthcare. We convened teams clinicians, medical engineering students, technical professionals (80 participants total) to stress-test models with real-world clinical cases categorize inappropriate responses along axes safety, privacy, hallucinations/accuracy, bias. Six...

10.1038/s41746-025-01542-0 article EN cc-by npj Digital Medicine 2025-03-07

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet process creating exam questions and study materials both time-consuming costly. While Large Language Models (LLMs), such as OpenAI’s GPT-4, have demonstrated proficiency answering medical questions, their potential generating remains underexplored. This presents QUEST-AI, novel system that utilizes LLMs to (1) generate USMLE-style (2) identify flag incorrect (3)...

10.1101/2023.04.25.23288588 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2023-04-28

Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that be audited reliability and fairness. However, there is a gap of operational guidance performing fairness audits practice. Following guideline recommendations, we conducted audit two based on model performance calibration as well summary statistics, subgroup calibration. We assessed the Epic End-of-Life (EOL) Index an internally developed Stanford Hospital Medicine (HM) Advance Care Planning...

10.3389/fdgth.2022.943768 article EN cc-by Frontiers in Digital Health 2022-09-12

Abstract Background Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust temporal measured out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance. Methods Our consisted of intensive care unit patients from MIMIC-IV categorized year groups...

10.1055/s-0043-1762904 article EN Methods of Information in Medicine 2023-02-22

The ability of large language models (LLMs) to follow natural instructions with human-level fluency suggests many opportunities in healthcare reduce administrative burden and improve quality care. However, evaluating LLMs on realistic text generation tasks for remains challenging. Existing question answering datasets electronic health record (EHR) data fail capture the complexity information needs documentation burdens experienced by clinicians. To address these challenges, we introduce...

10.48550/arxiv.2308.14089 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Abstract Although individual psychotherapy is generally effective for a range of mental health conditions, little known about the moment-to-moment language use therapists. Increased access to computational power, coupled with rise in computer-mediated communication (telehealth), makes feasible large-scale analyses during psychotherapy. Transparent methodological approaches are lacking, however. Here we present novel methods increase efficiency efforts examine We evaluate three important...

10.1038/s44184-022-00020-9 article EN cc-by npj Mental Health Research 2022-12-02

Abstract In 2008, Oregon expanded its Medicaid program using a lottery, creating rare opportunity to study the effects of coverage randomized controlled design (Oregon Health Insurance Experiment). Analysis showed that lowered risk depression. However, this effect may vary between individuals, and identification individuals likely benefit most has potential improve effectiveness efficiency program. By applying machine learning causal forest data from experiment, we found substantial...

10.1093/aje/kwae008 article EN American Journal of Epidemiology 2024-02-22

Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised using adult data may be a promising approach to creating robust prediction models. The primary objective was determine whether self-supervised model trained in noninferior logistic regression inpatients, inpatient clinical tasks.

10.1093/jamia/ocad175 article EN Journal of the American Medical Informatics Association 2023-08-28
Coming Soon ...