- Neural dynamics and brain function
- Speech Recognition and Synthesis
- Functional Brain Connectivity Studies
- EEG and Brain-Computer Interfaces
- Identity, Memory, and Therapy
- Speech and Audio Processing
- Grief, Bereavement, and Mental Health
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Music and Audio Processing
- Visual perception and processing mechanisms
- 3D Surveying and Cultural Heritage
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Face Recognition and Perception
- Video Surveillance and Tracking Methods
- Topic Modeling
- Speech and dialogue systems
- Radiomics and Machine Learning in Medical Imaging
- Human Pose and Action Recognition
- Image Enhancement Techniques
- Robotics and Sensor-Based Localization
- Neural Networks and Applications
- Anxiety, Depression, Psychometrics, Treatment, Cognitive Processes
- Machine Learning in Healthcare
Google (United States)
2024-2025
Google (United Kingdom)
2024
DeepMind (United Kingdom)
2024
University of Science and Technology of China
2023
National Tsing Hua University
2023
Beijing University of Chemical Technology
2021-2022
National Taiwan University
2019-2021
Columbia University
2017-2021
New York University
2017
Stanford University
2016
BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, interpret these might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development models, we curated MultiMedBench, a new multimodal benchmark. MultiMedBench...
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date knowledge and understanding complex multimodal data. Gemini models, with strong general capabilities long-context offer exciting possibilities medicine. Building on these core strengths Gemini, we introduce Med-Gemini, family highly capable models that are specialized medicine the ability seamlessly use web search, can be efficiently tailored novel...
Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score United States Medical Licensing Examination style questions. However, challenges remain long-form answering and handling real-world workflows. Here, we present 2, which bridges these gaps combination of base LLM improvements, domain fine-tuning new strategies for improving reasoning grounding through ensemble refinement chain retrieval. 2 scores up 86.5% on...
Automated radiology report generation has the potential to improve patient care and reduce workload of radiologists. However, path toward real-world adoption been stymied by challenge evaluating clinical quality artificial intelligence (AI)-generated reports. We build a state-of-the-art system for chest radiographs, called Flamingo-CXR, perform an expert evaluation AI-generated reports engaging panel board-certified observe wide distribution preferences across settings, with 56.1%...
End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data.However, laborious data collection remains difficult for at least 95% the languages over world, which hinders development TTS in different languages.In this paper, we aim to build systems such low-resource (target) where only very limited are available.We show can be effectively constructed by transferring knowledge from a high-resource (source) language.Since model trained source...
In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close phoneme speech utterances. This is achieved by proper temporal segmentation make the phoneme-synchronized, phonetic clustering have total number distinct phonemes. Mapping between phonemes learned small amount annotated paired data. Preliminary experiments on LJSpeech demonstrated for vowels relative locations...
Advances in the instrumentation and signal processing for simultaneously acquired electroencephalography functional magnetic resonance imaging (EEG-fMRI) have enabled new ways to observe spatiotemporal neural dynamics of human brain. Central utility EEG-fMRI neuroimaging systems are methods fusing two data streams, with machine learning playing a key role. These can be dichotomized into those that symmetric asymmetric terms how modalities inform fusion. Studies using these shown fusion...
Most prior semantic segmentation methods have been developed for day-time scenes, while typically underperforming in night-time scenes due to insufficient and complicated lighting conditions. In this work, we tackle challenge by proposing a novel paradigm, i.e., disentangle then parse (DTP). DTP explicitly disentangles images into light-invariant reflectance light-specific illumination components recognizes semantics based on their adaptive fusion. Concretely, the proposed comprises two key...
GuessWhat?! is a visual dialog guessing game which incorporates Questioner agent that generates sequence of questions, while an Oracle answers the respective questions about target object in image. Based on this history between and Oracle, Guesser makes final guess object. While previous work has focused dialogue policy optimization visual-linguistic information fusion, most learns vision-linguistic encoding for three agents solely dataset without shared prior knowledge representation. To...
Network interactions are likely to be instrumental in processes underlying rapid perception and cognition. Specifically, high-level perceptual regions must interact balance pre-existing models of the environment with new incoming stimuli. Simultaneous electroencephalography (EEG) fMRI (EEG/fMRI) enables temporal characterization brain-network combined improved anatomical localization regional activity. In this paper, we use simultaneous EEG/fMRI multivariate dynamical systems (MDS) analysis...
An avoidant grief style is marked by repeated and often unsuccessful attempts to prevent thinking about loss. Prior work shows involves monitoring the external environment in order avoid reminders of Here we sought determine whether grievers also monitor internal minimize conscious awareness loss-related thoughts. Individuals bereaved a first-degree relative, spouse or partner within last 14 months participated functional magnetic resonance imaging (fMRI) study (N = 29). We first applied...
The hierarchical architecture of deep convolutional neural networks (CNN) resembles the multi-level processing stages human visual system during object recognition. Converging evidence suggests that this organization is key to CNN achieving human-level performance in categorization [22]. In paper, we leverage investigate spatiotemporal dynamics rapid brain. Specifically focus on perceptual decisions associated with different levels ambiguity. Using simultaneous EEG-fMRI, demonstrate temporal...
We propose ImGeoNet, a multi-view image-based 3D object detection framework that models space by an image-induced geometry-aware voxel representation. Unlike previous methods which aggregate 2D features into voxels without considering geometry, ImGeoNet learns to induce geometry from images alleviate the confusion arising of free space, and during inference phase, only multiple views are required. Besides, powerful pre-trained feature extractor can be leveraged our representation, leading...
The field of indoor monocular 3D object detection is gaining significant attention, fueled by the increasing demand in VR/AR and robotic applications. However, its advancement impeded limited availability diversity training data, owing to labor-intensive nature data collection annotation processes. In this paper, we present V-MIND (Versatile Monocular INdoor Detector), which enhances performance detectors across a diverse set classes harnessing publicly available large-scale 2D datasets. By...