- Artificial Intelligence in Healthcare and Education
- Topic Modeling
- Mobile Crowdsensing and Crowdsourcing
- Retinal Imaging and Analysis
- Machine Learning in Healthcare
- Retinal Diseases and Treatments
- Auction Theory and Applications
- Data Stream Mining Techniques
- EEG and Brain-Computer Interfaces
- Explainable Artificial Intelligence (XAI)
- Glaucoma and retinal disorders
- Radiomics and Machine Learning in Medical Imaging
- Privacy-Preserving Technologies in Data
- Cutaneous Melanoma Detection and Management
- Data Quality and Management
- Epilepsy research and treatment
- Ethics and Social Impacts of AI
- Time Series Analysis and Forecasting
- Radiology practices and education
- Big Data and Business Intelligence
- Artificial Intelligence in Law
- AI in cancer detection
- Medical Imaging and Analysis
- Psychological and Educational Research Studies
- Imbalanced Data Classification Techniques
Google (United States)
2019-2025
Google (United Kingdom)
2024
Amazon (Germany)
2022
Amazon (United States)
2022
University of Waterloo
2017-2021
Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability retrieve medical knowledge, reason over it, and answer questions comparably physicians has long been viewed as one such grand challenge. Large language models (LLMs) catalyzed significant progress question answering; Med-PaLM was the first model exceed a "passing" score US Medical Licensing Examination (USMLE) style with of 67.2% on MedQA dataset....
PurposeTo develop and validate a deep learning (DL) algorithm that predicts referable glaucomatous optic neuropathy (GON) nerve head (ONH) features from color fundus images, to determine the relative importance of these in referral decisions by glaucoma specialists (GSs) algorithm, compare performance with eye care providers.DesignDevelopment validation an algorithm.ParticipantsFundus images screening programs, studies, clinic.MethodsA DL was trained using retrospective dataset 86 618...
BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, interpret these might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development models, we curated MultiMedBench, a new multimodal benchmark. MultiMedBench...
At the heart of medicine lies physician-patient dialogue, where skillful history-taking paves way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable diagnostic dialogue could increase accessibility, consistency, quality care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Explorer), a Large Language Model (LLM) based AI system optimized dialogue. uses novel...
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date knowledge and understanding complex multimodal data. Gemini models, with strong general capabilities long-context offer exciting possibilities medicine. Building on these core strengths Gemini, we introduce Med-Gemini, family highly capable models that are specialized medicine the ability seamlessly use web search, can be efficiently tailored novel...
Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score United States Medical Licensing Examination style questions. However, challenges remain long-form answering and handling real-world workflows. Here, we present 2, which bridges these gaps combination of base LLM improvements, domain fine-tuning new strategies for improving reasoning grounding through ensemble refinement chain retrieval. 2 scores up 86.5% on...
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist automate aspects this process. In study, we introduce LLM optimized for diagnostic reasoning, evaluate its ability generate DDx alone or as aid clinicians. 20 clinicians...
BackgroundArtificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed develop a framework quantitatively assess the performance equity of health AI technologies and illustrate its utility via case study.MethodsHere, we propose methodology whether prioritise for patient populations experiencing worse outcomes, that is complementary existing fairness metrics. developed Health Equity Assessment machine Learning (HEAL) designed four-step...
Automated radiology report generation has the potential to improve patient care and reduce workload of radiologists. However, path toward real-world adoption been stymied by challenge evaluating clinical quality artificial intelligence (AI)-generated reports. We build a state-of-the-art system for chest radiographs, called Flamingo-CXR, perform an expert evaluation AI-generated reports engaging panel board-certified observe wide distribution preferences across settings, with 56.1%...
Crowdsourced classification of data typically assumes that objects can be unambiguously classified into categories. In practice, many tasks are ambiguous due to various forms disagreement. Prior work shows exchanging verbal justifications significantly improve answer accuracy over aggregation techniques. this work, we study how worker deliberation affects resolvability and using case studies with both an objective a subjective task. Results show depends on factors, including the level...
Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, interpret this at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To the development of these models, we first curate MultiMedBench, a new multimodal benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering,...
Expert disagreement is pervasive in clinical decision making and collective adjudication a useful approach for resolving divergent assessments. Prior work shows that expert can arise due to diverse factors including background, the quality presentation of data, guideline clarity. In this work, we study how these predict initial discrepancies context medical time series analysis, examining why certain disagreements persist after adjudication, impacts decisions. Results from case with 36...
Artificial intelligence (AI) assistants for clinical decision making show increasing promise in medicine. However, medical assessments can be contentious, leading to expert disagreement. This raises the question of how AI should designed handle classification ambiguous cases. Our study compared two that provide labels time series data along with quantitative uncertainty estimates: conventional vs. ambiguity-aware. We simulated our ambiguity-aware based on real-world discussions highlight...
This forum provides a space to engage with the challenges of designing for intelligent algorithmic experiences. We invite articles that tackle tensions between research and practice when integrating AI UX design. welcome interdisciplinary debate, artful critique, forward-looking research, case studies in practice, speculative design explorations. --- Juho Kim Henriette Cramer, Editors
Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools support medical decisions may provide patients with feedback on their while also helping ensure the most urgent cases routed dermatologists. AI-based conversational agents have been explored recently, how they perceived by and clinicians not well understood. We conducted a Wizard-of-Oz study involving 18 participants real concerns. Participants were randomly assigned interact...
Identifying player motivations such as curiosity could help game designers analyze profiles and substantially improve design. However, research on profiling focuses generalized personality traits, not specific aspects of motivation. This study examines how behaviour indicates constructs curiosity-related It contributes a more discriminating operationalization game-related curiosity. We derive measure from established self-report survey methodologies relating to social capital, behavioural...
Medical data labeling workflows critically depend on accurate assessments from human experts. Yet can vary markedly, even among medical Prior research has demonstrated benefits of labeler training performance. Here we utilized two types feedback: highlighting incorrect labels for difficult cases ("individual performance" feedback), and expert discussions adjudication these cases. We presented ten generalist eye care professionals with either individual performance alone, or specialists....
To present and evaluate a remote, tool-based system structured grading rubric for adjudicating image-based diabetic retinopathy (DR) grades.We compared three different procedures DR severity assessments among retina specialist panels, including (1) in-person adjudication based on previously described procedure (Baseline), (2) assessing alone (TA), (3) using feature-based (TA-F). We developed allowing graders to review images remotely asynchronously. For both TA TA-F approaches, with...
We consider a class of variable effort human annotation tasks in which the number labels required per item can greatly vary (e.g., finding all faces an image, named entities text, bird calls audio recording, etc.). In such tasks, some items require far more than others to annotate. Furthermore, per-item is not known until after each annotated since determining implicit part task itself. On image bounding-box with crowdsourced annotators, we show that annotator accuracy and recall...