- Artificial Intelligence in Healthcare and Education
- Topic Modeling
- Radiomics and Machine Learning in Medical Imaging
- Machine Learning in Healthcare
- AI in cancer detection
- Natural Language Processing Techniques
- Biomedical Text Mining and Ontologies
- Radiology practices and education
- Explainable Artificial Intelligence (XAI)
- COVID-19 diagnosis using AI
- CRISPR and Genetic Engineering
- Text Readability and Simplification
- Empathy and Medical Education
- Simulation Techniques and Applications
- Clinical Reasoning and Diagnostic Skills
- Digital Mental Health Interventions
- Ethics and Social Impacts of AI
- Genomics and Phylogenetic Studies
- Mental Health Treatment and Access
- Machine Learning and Data Classification
- Mycobacterium research and diagnosis
- Cell Image Analysis Techniques
- Patient-Provider Communication in Healthcare
- Anomaly Detection Techniques and Applications
- Cardiac Health and Mental Health
Google (United States)
2023-2025
Google (United Kingdom)
2024
Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess knowledge of typically rely on automated evaluations based limited benchmarks. Here, address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries new dataset questions searched online, HealthSearchQA. We propose human...
Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability retrieve medical knowledge, reason over it, and answer questions comparably physicians has long been viewed as one such grand challenge. Large language models (LLMs) catalyzed significant progress question answering; Med-PaLM was the first model exceed a "passing" score US Medical Licensing Examination (USMLE) style with of 67.2% on MedQA dataset....
BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, interpret these might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development models, we curated MultiMedBench, a new multimodal benchmark. MultiMedBench...
At the heart of medicine lies physician-patient dialogue, where skillful history-taking paves way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable diagnostic dialogue could increase accessibility, consistency, quality care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Explorer), a Large Language Model (LLM) based AI system optimized dialogue. uses novel...
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date knowledge and understanding complex multimodal data. Gemini models, with strong general capabilities long-context offer exciting possibilities medicine. Building on these core strengths Gemini, we introduce Med-Gemini, family highly capable models that are specialized medicine the ability seamlessly use web search, can be efficiently tailored novel...
Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score United States Medical Licensing Examination style questions. However, challenges remain long-form answering and handling real-world workflows. Here, we present 2, which bridges these gaps combination of base LLM improvements, domain fine-tuning new strategies for improving reasoning grounding through ensemble refinement chain retrieval. 2 scores up 86.5% on...
Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment machine learning systems. This paper proposes and investigates the use contrastive training boost OOD performance. Unlike leading methods detection, our approach does not require access examples labeled explicitly as OOD, which can difficult collect in practice. We show extensive experiments that significantly helps performance on number common benchmarks. By introducing...
Automated radiology report generation has the potential to improve patient care and reduce workload of radiologists. However, path toward real-world adoption been stymied by challenge evaluating clinical quality artificial intelligence (AI)-generated reports. We build a state-of-the-art system for chest radiographs, called Flamingo-CXR, perform an expert evaluation AI-generated reports engaging panel board-certified observe wide distribution preferences across settings, with 56.1%...
Large language models (LLMs) have demonstrated impressive capabilities in natural understanding and generation, but the quality bar for medical clinical applications is high. Today, attempts to assess models' knowledge typically rely on automated evaluations limited benchmarks. There no standard evaluate model predictions reasoning across a breadth of tasks. To address this, we present MultiMedQA, benchmark combining six existing open question answering datasets spanning professional exams,...
Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated settings different from the training environment. A common mitigation strategy is develop separate for each setting using site-specific data [1]. this quickly becomes impractical as medical time-consuming acquire and expensive annotate [2]. Thus, problem of "data-efficient...
Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, interpret this at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To the development of these models, we first curate MultiMedBench, a new multimodal benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering,...
Developing therapeutics is a lengthy and expensive process that requires the satisfaction of many different criteria, AI models capable expediting would be invaluable. However, majority current approaches address only narrowly defined set tasks, often circumscribed within particular domain. To bridge this gap, we introduce Tx-LLM, generalist large language model (LLM) fine-tuned from PaLM-2 which encodes knowledge about diverse therapeutic modalities. Tx-LLM trained using collection 709...
AI models have been proposed for hypothesis generation, but testing their ability to drive high-impact research is challenging, since an AI-generated can take decades validate. Here, we challenge the of a recently developed LLM-based platform, co-scientist, generate high-level hypotheses by posing question that took years resolve experimentally remained unpublished: How could capsid-forming phage-inducible chromosomal islands (cf-PICIs) spread across bacterial species? Remarkably,...
Transfer learning is a standard technique to improve performance on tasks with limited data. However, for medical imaging, the value of transfer less clear. This likely due large domain mismatch between usual natural-image pre-training (e.g. ImageNet) and images. recent advances in have shown substantial improvements from scale. We investigate whether modern methods can change fortune imaging. For this, we study class large-scale pre-trained networks presented by Kolesnikov et al. three...
Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment machine learning healthcare settings. Importantly, success any mitigation strategy strongly depends on structure shift. Despite this, there has been little discussion how to empirically assess a that one encountering practice. In this work, we adopt causal framing motivate conditional independence tests as key tool for characterizing shifts. Using our approach two...
The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being do so. To assess this, n = 145 depression =115 PTSD assessments 46 case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma stress, Addictive disorders) were analyzed using prompts extract...
Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools support medical decisions may provide patients with feedback on their while also helping ensure the most urgent cases routed dermatologists. AI-based conversational agents have been explored recently, how they perceived by and clinicians not well understood. We conducted a Wizard-of-Oz study involving 18 participants real concerns. Participants were randomly assigned interact...
<title>Abstract</title> Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage radiologists, however, restricts access to expert care imposes heavy workloads, contributing avoidable errors in report delivery. While recent progress automated generation with vision-language models offers clear potential ameliorate this situation, the path toward real-world adoption has been stymied by challenge...
Self-supervised pretraining followed by supervised fine-tuning has seen success in image recognition, especially when labeled examples are scarce, but received limited attention medical analysis. This paper studies the effectiveness of self-supervised learning as a strategy for classification. We conduct experiments on two distinct tasks: dermatology skin condition classification from digital camera images and multi-label chest X-ray classification, demonstrate that ImageNet, additional...
The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is acute cardiology where timely, accurate management determines outcomes. We explored the potential AMIE (Articulate Medical Intelligence Explorer), large language model (LLM)-based experimental AI system optimized diagnostic dialogue, to potentially augment support clinical decision-making this challenging context....
Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage radiologists, however, restricts access to expert care imposes heavy workloads, contributing avoidable errors delays in report delivery. While recent progress automated generation with vision-language models offer clear potential ameliorating the situation, path real-world adoption has been stymied by challenge evaluating quality AI-generated...