- Machine Learning in Healthcare
- Artificial Intelligence in Healthcare and Education
- Functional Brain Connectivity Studies
- Electronic Health Records Systems
- Topic Modeling
- Mental Health Research Topics
- Sepsis Diagnosis and Treatment
- Biomedical Text Mining and Ontologies
- Treatment of Major Depression
- Healthcare cost, quality, practices
- Clinical Reasoning and Diagnostic Skills
- Mental Health via Writing
- Child Development and Digital Technology
- Digital Mental Health Interventions
- Healthcare Policy and Management
- Advanced Neuroimaging Techniques and Applications
- Advanced MRI Techniques and Applications
- Autism Spectrum Disorder Research
- Palliative Care and End-of-Life Issues
- Transcranial Magnetic Stimulation Studies
- Advanced Text Analysis Techniques
- Medical Coding and Health Information
- Educational Technology and Assessment
- Neurobiology of Language and Bilingualism
- Forecasting Techniques and Applications
Stanford University
2019-2025
Mayo Clinic in Arizona
2023
Stanford Health Care
2023
Stanford Medicine
2023
Carnegie Mellon University
2020
Erasmus University Rotterdam
2020
The ability of large language models (LLMs) to follow natural instructions with human-level fluency suggests many opportunities in healthcare reduce administrative burden and improve quality care. However, evaluating LLMs on realistic text generation tasks for remains challenging. Existing question answering datasets electronic health record (EHR) data fail capture the complexity information needs documentation burdens experienced by clinicians. To address these challenges, we introduce...
To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance that of musculoskeletal radiologists.Radiographs the Osteoarthritis Initiative staged by a radiologist committee using Kellgren-Lawrence (KL) system were used. Before images as input convolutional neural network model, they standardized augmented automatically. The was trained with 32 116 images, tuned 4074 evaluated 4090-image test set, compared two individual radiologists...
Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers accelerate process by collecting home videos US-based children, identifying a reduced subset behavioral features are scored untrained raters classifier determine children's "risk scores" for autism. We...
BackgroundDespite tremendous advances in characterizing human neural circuits that govern emotional and cognitive functions impaired depression anxiety, we lack a circuit-based taxonomy for anxiety captures transdiagnostic heterogeneity informs clinical decision making.MethodsWe developed tested novel system quantifying 6 brain reproducibly at the individual patient level. We implemented standardized circuit definitions relative to healthy reference sample algorithms generate scores overall...
Abstract In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. order of an event relative a time index) can inform many important analyses. However, creating training data for entity tasks is consuming sharing labeled challenging due privacy concerns. The information needs COVID-19 pandemic highlight need agile methods machine learning models notes. We present Trove, framework weakly supervised classification medical...
Abstract Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve robustness task-specific models. The objective was to evaluate utility EHR improving in-distribution (ID) and out-of-distribution (OOD) Transformer- gated recurrent unit-based were pretrained up 1.8 M patients (382...
There are a number of available methods for selecting whom to prioritize treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average (RATE) metrics as simple general family comparing testing the quality prioritization RATE agnostic how rules were derived, only assess well they identify individuals that benefit most from treatment. define estimators prove central limit theorem enables asymptotically exact inference in...
Abstract Background Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels tasks, and identify cohorts multicenter network studies. However, the coverage rates of diagnostic their variability across institutions underexplored. The primary objective was describe lab- diagnosis-based 7 selected outcomes at three institutions. Secondary objectives were agreement, sensitivity, specificity against lab-based labels. Methods This study included cohorts:...
Abstract Accurate transcription of audio recordings in psychotherapy would improve therapy effectiveness, clinician training, and safety monitoring. Although automatic speech recognition software is commercially available, its accuracy mental health settings has not been well described. It unclear which metrics thresholds are appropriate for different clinical use cases, may range from population descriptions to individual Here we show that feasible psychotherapy, but further improvements...
0. Abstract Background The integration of large language models (LLMs) in healthcare offers immense opportunity to streamline tasks, but also carries risks such as response accuracy and bias perpetration. To address this, we conducted a red-teaming exercise assess LLMs developed dataset clinically relevant scenarios for future teams use. Methods We convened 80 multi-disciplinary experts evaluate the performance popular across multiple medical scenarios. Teams composed clinicians, engineering...
Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. structured electronic health records (EHR), trained on coded medical from millions of patients, demonstrated benefits including increased performance with fewer training labels, improved robustness to distribution shifts. However, questions remain the feasibility sharing these across...
Abstract Countless studies have advanced our understanding of the human brain and its organization by using functional magnetic resonance imaging (fMRI) to derive network representations function. However, we do not know what extent these “functional connectomes” are reliable over time. In a large public sample healthy participants (N = 833) scanned on two consecutive days, assessed test-retest reliability fMRI connectivity consequences three common sources variation in analysis workflows:...
Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy large language models, but non-model creator-affiliated red teaming scant in healthcare. We convened teams clinicians, medical engineering students, technical professionals (80 participants total) to stress-test models with real-world clinical cases categorize inappropriate responses along axes safety, privacy, hallucinations/accuracy, bias. Six...
The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet process creating exam questions and study materials both time-consuming costly. While Large Language Models (LLMs), such as OpenAI’s GPT-4, have demonstrated proficiency answering medical questions, their potential generating remains underexplored. This presents QUEST-AI, novel system that utilizes LLMs to (1) generate USMLE-style (2) identify flag incorrect (3)...
Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that be audited reliability and fairness. However, there is a gap of operational guidance performing fairness audits practice. Following guideline recommendations, we conducted audit two based on model performance calibration as well summary statistics, subgroup calibration. We assessed the Epic End-of-Life (EOL) Index an internally developed Stanford Hospital Medicine (HM) Advance Care Planning...
Abstract Background Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust temporal measured out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance. Methods Our consisted of intensive care unit patients from MIMIC-IV categorized year groups...
The ability of large language models (LLMs) to follow natural instructions with human-level fluency suggests many opportunities in healthcare reduce administrative burden and improve quality care. However, evaluating LLMs on realistic text generation tasks for remains challenging. Existing question answering datasets electronic health record (EHR) data fail capture the complexity information needs documentation burdens experienced by clinicians. To address these challenges, we introduce...
Abstract Although individual psychotherapy is generally effective for a range of mental health conditions, little known about the moment-to-moment language use therapists. Increased access to computational power, coupled with rise in computer-mediated communication (telehealth), makes feasible large-scale analyses during psychotherapy. Transparent methodological approaches are lacking, however. Here we present novel methods increase efficiency efforts examine We evaluate three important...
Abstract In 2008, Oregon expanded its Medicaid program using a lottery, creating rare opportunity to study the effects of coverage randomized controlled design (Oregon Health Insurance Experiment). Analysis showed that lowered risk depression. However, this effect may vary between individuals, and identification individuals likely benefit most has potential improve effectiveness efficiency program. By applying machine learning causal forest data from experiment, we found substantial...
Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised using adult data may be a promising approach to creating robust prediction models. The primary objective was determine whether self-supervised model trained in noninferior logistic regression inpatients, inpatient clinical tasks.