- Topic Modeling
- Radiomics and Machine Learning in Medical Imaging
- Natural Language Processing Techniques
- Text Readability and Simplification
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Face recognition and analysis
- Speech Recognition and Synthesis
- Colorectal Cancer Screening and Detection
- Radiology practices and education
- Artificial Intelligence in Healthcare and Education
- Advanced Neural Network Applications
- Video Analysis and Summarization
- Meningioma and schwannoma management
- Medical Imaging and Analysis
- Dental Radiography and Imaging
- Airway Management and Intubation Techniques
- Computational and Text Analysis Methods
- Visual Attention and Saliency Detection
Microsoft (United States)
2025
Stanford University
2020-2021
Large foundation models show promise in biomedicine but face challenges clinical use due to performance gaps, accessibility, cost, and lack of scalable evaluation. Here we that open-source small multimodal can bridge these gaps radiology by generating free-text findings from chest X-ray images. Our data-centric approach leverages 697K curated image-text pairs train a specialized, domain-adapted encoder. We integrate this encoder with pre-trained language via lightweight adapter aligns image...
Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
Caricature, a type of exaggerated artistic portrait, amplifies the distinctive, yet nuanced traits human faces. This task is typically left to artists, as it has proven difficult capture subjects' unique characteristics well using automated methods. Recent development deep end-to-end methods achieved promising results in capturing style and higher-level exaggerations. However, key part caricatures, face warping, remained challenging for these systems. In this work, we propose AutoToon, first...
Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development use generative multimodal models. Here, we extend report generation to include localisation individual findings on - call grounded Prior work indicates grounding important clarifying understanding interpreting AI-generated text. Therefore, stands improve utility...
Abstract Surgeons must visually distinguish soft-tissues, such as nerves, from surrounding anatomy to prevent complications and optimize patient outcomes. An accurate nerve segmentation analysis tool could provide useful insight for surgical decision-making. Here, we present an end-to-end, automatic deep learning computer vision algorithm segment measure nerves. Unlike traditional medical imaging, our unconstrained setup with accessible handheld digital cameras, along the unstructured open...
The scaling laws and extraordinary performance of large foundation models motivate the development utilization such in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these can used real-world applications. Frontier as GPT-4V have competency gaps multimodal capabilities for Moreover, pragmatic issues access, cost, latency, compliance make it hard clinicians use privately-hosted...
We consider the task of semi-supervised video object segmentation (VOS). Our approach mitigates shortcomings in previous VOS work by addressing detail preservation and temporal consistency using visual warping. In contrast to prior that uses full optical flow, we introduce a new foreground-targeted warping learns flow fields from data. train module capture detailed motion between frames two weakly-supervised losses. object-focused foreground masks their positions target frame enables mask...
Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates language models' knowledge using hand-crafted minimal pairs sentences that differ only the main verb's conjugation. The method whether models rate each grammatical sentence as more likely than its ungrammatical counterpart. We identify two distinct goals for TSE. First, evaluating systematicity a model's knowledge: given sentence, can it conjugate arbitrary verbs correctly? Second, behavior: does model...
The increasing use of medical imaging in healthcare settings presents a significant challenge due to the workload for radiologists, yet it also offers opportunity enhancing outcomes if effectively leveraged. 3D image retrieval holds potential reduce radiologist workloads by enabling clinicians efficiently search through diagnostically similar or otherwise relevant cases, resulting faster and more precise diagnoses. However, field is still emerging, lacking established evaluation benchmarks,...
Caricature, a type of exaggerated artistic portrait, amplifies the distinctive, yet nuanced traits human faces. This task is typically left to artists, as it has proven difficult capture subjects' unique characteristics well using automated methods. Recent development deep end-to-end methods achieved promising results in capturing style and higher-level exaggerations. However, key part caricatures, face warping, remained challenging for these systems. In this work, we propose AutoToon, first...