Christopher Semturs

ORCID: 0000-0001-6108-2773
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Artificial Intelligence in Healthcare and Education
  • Retinal Imaging and Analysis
  • Topic Modeling
  • Machine Learning in Healthcare
  • Retinal and Optic Conditions
  • AI in cancer detection
  • Glaucoma and retinal disorders
  • Natural Language Processing Techniques
  • Retinal Diseases and Treatments
  • Biomedical Text Mining and Ontologies
  • Cutaneous Melanoma Detection and Management
  • Digital Imaging in Medicine
  • Data-Driven Disease Surveillance
  • Social Media in Health Education
  • Artificial Intelligence in Healthcare
  • Healthcare Systems and Public Health
  • Digital Mental Health Interventions
  • Empathy and Medical Education
  • Digital Imaging for Blood Diseases
  • Simulation Techniques and Applications
  • Radiomics and Machine Learning in Medical Imaging
  • Risk Perception and Management
  • Body Image and Dysmorphia Studies
  • Clinical Reasoning and Diagnostic Skills
  • Text Readability and Simplification

Google (United States)
2019-2025

Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess knowledge of typically rely on automated evaluations based limited benchmarks. Here, address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries new dataset questions searched online, HealthSearchQA. We propose human...

10.1038/s41586-023-06291-2 article EN cc-by Nature 2023-07-12

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability retrieve medical knowledge, reason over it, and answer questions comparably physicians has long been viewed as one such grand challenge. Large language models (LLMs) catalyzed significant progress question answering; Med-PaLM was the first model exceed a "passing" score US Medical Licensing Examination (USMLE) style with of 67.2% on MedQA dataset....

10.48550/arxiv.2305.09617 preprint EN cc-by arXiv (Cornell University) 2023-01-01

PurposeTo develop and validate a deep learning (DL) algorithm that predicts referable glaucomatous optic neuropathy (GON) nerve head (ONH) features from color fundus images, to determine the relative importance of these in referral decisions by glaucoma specialists (GSs) algorithm, compare performance with eye care providers.DesignDevelopment validation an algorithm.ParticipantsFundus images screening programs, studies, clinic.MethodsA DL was trained using retrospective dataset 86 618...

10.1016/j.ophtha.2019.07.024 article EN cc-by-nc-nd Ophthalmology 2019-09-24

BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, interpret these might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development models, we curated MultiMedBench, a new multimodal benchmark. MultiMedBench...

10.1056/aioa2300138 article EN NEJM AI 2024-02-22

At the heart of medicine lies physician-patient dialogue, where skillful history-taking paves way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable diagnostic dialogue could increase accessibility, consistency, quality care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Explorer), a Large Language Model (LLM) based AI system optimized dialogue. uses novel...

10.48550/arxiv.2401.05654 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date knowledge and understanding complex multimodal data. Gemini models, with strong general capabilities long-context offer exciting possibilities medicine. Building on these core strengths Gemini, we introduce Med-Gemini, family highly capable models that are specialized medicine the ability seamlessly use web search, can be efficiently tailored novel...

10.48550/arxiv.2404.18416 preprint EN arXiv (Cornell University) 2024-04-29

Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score United States Medical Licensing Examination style questions. However, challenges remain long-form answering and handling real-world workflows. Here, we present 2, which bridges these gaps combination of base LLM improvements, domain fine-tuning new strategies for improving reasoning grounding through ensemble refinement chain retrieval. 2 scores up 86.5% on...

10.1038/s41591-024-03423-7 article EN cc-by-nc-nd Nature Medicine 2025-01-08

BackgroundAI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these learned is severely lacking. Explanations are needed order increase the trust of doctors AI-based models, especially domains where AI prediction capabilities surpass those humans. Moreover, such explanations could enable novel scientific discovery by uncovering data that aren't yet known experts.MethodsIn this paper, we present a workflow for generating...

10.1016/j.ebiom.2024.105075 article EN cc-by-nc EBioMedicine 2024-04-01

An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist automate aspects this process. In study, we introduce LLM optimized for diagnostic reasoning, evaluate its ability generate DDx alone or as aid clinicians. 20 clinicians...

10.48550/arxiv.2312.00164 preprint EN other-oa arXiv (Cornell University) 2023-01-01

BackgroundArtificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed develop a framework quantitatively assess the performance equity of health AI technologies and illustrate its utility via case study.MethodsHere, we propose methodology whether prioritise for patient populations experiencing worse outcomes, that is complementary existing fairness metrics. developed Health Equity Assessment machine Learning (HEAL) designed four-step...

10.1016/j.eclinm.2024.102479 article EN cc-by-nc-nd EClinicalMedicine 2024-03-14

Large language models (LLMs) have demonstrated impressive capabilities in natural understanding and generation, but the quality bar for medical clinical applications is high. Today, attempts to assess models' knowledge typically rely on automated evaluations limited benchmarks. There no standard evaluate model predictions reasoning across a breadth of tasks. To address this, we present MultiMedQA, benchmark combining six existing open question answering datasets spanning professional exams,...

10.48550/arxiv.2212.13138 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Photographs of the external eye were recently shown to reveal signs diabetic retinal disease and elevated glycated haemoglobin. This study aimed test hypothesis that photographs contain information about additional systemic medical conditions.

10.1016/s2589-7500(23)00022-5 article EN cc-by-nc-nd The Lancet Digital Health 2023-03-24

Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, interpret this at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To the development of these models, we first curate MultiMedBench, a new multimodal benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering,...

10.48550/arxiv.2307.14334 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Developing therapeutics is a lengthy and expensive process that requires the satisfaction of many different criteria, AI models capable expediting would be invaluable. However, majority current approaches address only narrowly defined set tasks, often circumscribed within particular domain. To bridge this gap, we introduce Tx-LLM, generalist large language model (LLM) fine-tuned from PaLM-2 which encodes knowledge about diverse therapeutic modalities. Tx-LLM trained using collection 709...

10.48550/arxiv.2406.06316 preprint EN arXiv (Cornell University) 2024-06-10

Abstract At the heart of medicine lies physician–patient dialogue, where skillful history-taking enables effective diagnosis, management and enduring trust 1,2 . Artificial intelligence (AI) systems capable diagnostic dialogue could increase accessibility quality care. However, approximating clinicians’ expertise is an outstanding challenge. Here we introduce AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based AI system optimized for dialogue. uses...

10.1038/s41586-025-08866-7 article EN cc-by Nature 2025-04-09

Abstract A comprehensive differential diagnosis is a cornerstone of medical care that often reached through an iterative process interpretation combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by large language models present new opportunities to assist automate aspects this 1 . Here we introduce the Articulate Medical Intelligence Explorer (AMIE), model optimized for diagnostic reasoning, evaluate its ability generate alone or as...

10.1038/s41586-025-08869-4 article EN cc-by Nature 2025-04-09

Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools support medical decisions may provide patients with feedback on their while also helping ensure the most urgent cases routed dermatologists. AI-based conversational agents have been explored recently, how they perceived by and clinicians not well understood. We conducted a Wizard-of-Oz study involving 18 participants real concerns. Participants were randomly assigned interact...

10.1145/3613905.3651891 article EN 2024-05-11

Objective. To evaluate diabetic retinopathy (DR) screening via deep learning (DL) and trained human graders (HG) in a longitudinal cohort, as case spectrum shifts based on treatment referral new-onset DR. Methods. We randomly selected patients with diabetes screened twice, two years apart within nationwide program. The reference standard was established adjudication by retina specialists. Each patient’s color fundus photographs were graded, patient considered having sight-threatening DR...

10.1155/2020/8839376 article EN cc-by Journal of Diabetes Research 2020-12-15

Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in real world, impacting research, medical education, artificial intelligence (AI) tool development. Dermatology is a suitable area to develop test new scalable method create representative health datasets. Methods: We used Google Search advertisements invite contributions an open access dataset images dermatology conditions, demographic symptom information. With informed contributor...

10.48550/arxiv.2402.18545 preprint EN arXiv (Cornell University) 2024-02-28

Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential introduce harm and exacerbate disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote equity. In this work, we present resources methodologies for surfacing biases with precipitate harms in long-form, LLM-generated answers medical questions then conduct an empirical case study Med-PaLM 2, resulting largest...

10.48550/arxiv.2403.12025 preprint EN arXiv (Cornell University) 2024-03-18

Importance Health datasets from clinical sources do not reflect the breadth and diversity of disease, impacting research, medical education, artificial intelligence tool development. Assessments novel crowdsourcing methods to create health are needed. Objective To evaluate if web search advertisements (ads) effective at creating a diverse representative dermatology image dataset. Design, Setting, Participants This prospective observational survey study, conducted March November 2023, used...

10.1001/jamanetworkopen.2024.46615 article EN cc-by-nc-nd JAMA Network Open 2024-11-20

AI models have shown promise in many medical imaging tasks. However, our ability to explain what signals these learned is severely lacking. Explanations are needed order increase the trust AI-based models, and could enable novel scientific discovery by uncovering data that not yet known experts. In this paper, we present a method for automatic visual explanations leveraging team-based expertise generating hypotheses of images correlated with task. We propose following 4 steps: (i) Train...

10.48550/arxiv.2306.00985 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...