Tao Tu

ORCID: 0000-0001-9191-7938
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Neural dynamics and brain function
  • Speech Recognition and Synthesis
  • Functional Brain Connectivity Studies
  • EEG and Brain-Computer Interfaces
  • Identity, Memory, and Therapy
  • Speech and Audio Processing
  • Grief, Bereavement, and Mental Health
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Music and Audio Processing
  • Visual perception and processing mechanisms
  • 3D Surveying and Cultural Heritage
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Face Recognition and Perception
  • Video Surveillance and Tracking Methods
  • Topic Modeling
  • Speech and dialogue systems
  • Radiomics and Machine Learning in Medical Imaging
  • Human Pose and Action Recognition
  • Image Enhancement Techniques
  • Robotics and Sensor-Based Localization
  • Neural Networks and Applications
  • Anxiety, Depression, Psychometrics, Treatment, Cognitive Processes
  • Machine Learning in Healthcare

Google (United States)
2024-2025

Google (United Kingdom)
2024

DeepMind (United Kingdom)
2024

University of Science and Technology of China
2023

National Tsing Hua University
2023

Beijing University of Chemical Technology
2021-2022

National Taiwan University
2019-2021

Columbia University
2017-2021

New York University
2017

Stanford University
2016

BackgroundMedicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, interpret these might better enable impactful applications ranging from scientific discovery to care delivery.MethodsTo catalyze development models, we curated MultiMedBench, a new multimodal benchmark. MultiMedBench...

10.1056/aioa2300138 article EN NEJM AI 2024-02-22

Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date knowledge and understanding complex multimodal data. Gemini models, with strong general capabilities long-context offer exciting possibilities medicine. Building on these core strengths Gemini, we introduce Med-Gemini, family highly capable models that are specialized medicine the ability seamlessly use web search, can be efficiently tailored novel...

10.48550/arxiv.2404.18416 preprint EN arXiv (Cornell University) 2024-04-29

Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score United States Medical Licensing Examination style questions. However, challenges remain long-form answering and handling real-world workflows. Here, we present 2, which bridges these gaps combination of base LLM improvements, domain fine-tuning new strategies for improving reasoning grounding through ensemble refinement chain retrieval. 2 scores up 86.5% on...

10.1038/s41591-024-03423-7 article EN cc-by-nc-nd Nature Medicine 2025-01-08

Automated radiology report generation has the potential to improve patient care and reduce workload of radiologists. However, path toward real-world adoption been stymied by challenge evaluating clinical quality artificial intelligence (AI)-generated reports. We build a state-of-the-art system for chest radiographs, called Flamingo-CXR, perform an expert evaluation AI-generated reports engaging panel board-certified observe wide distribution preferences across settings, with 56.1%...

10.1038/s41591-024-03302-1 article EN cc-by-nc-nd Nature Medicine 2024-11-07

End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data.However, laborious data collection remains difficult for at least 95% the languages over world, which hinders development TTS in different languages.In this paper, we aim to build systems such low-resource (target) where only very limited are available.We show can be effectively constructed by transferring knowledge from a high-resource (source) language.Since model trained source...

10.21437/interspeech.2019-2730 article EN Interspeech 2022 2019-09-13

In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close phoneme speech utterances. This is achieved by proper temporal segmentation make the phoneme-synchronized, phonetic clustering have total number distinct phonemes. Mapping between phonemes learned small amount annotated paired data. Preliminary experiments on LJSpeech demonstrated for vowels relative locations...

10.1109/icassp40776.2020.9053571 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

10.1109/wacv61041.2025.00927 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

10.1109/wacv61041.2025.00227 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Advances in the instrumentation and signal processing for simultaneously acquired electroencephalography functional magnetic resonance imaging (EEG-fMRI) have enabled new ways to observe spatiotemporal neural dynamics of human brain. Central utility EEG-fMRI neuroimaging systems are methods fusing two data streams, with machine learning playing a key role. These can be dichotomized into those that symmetric asymmetric terms how modalities inform fusion. Studies using these shown fusion...

10.1146/annurev-neuro-100220-093239 article EN Annual Review of Neuroscience 2021-03-24

Most prior semantic segmentation methods have been developed for day-time scenes, while typically underperforming in night-time scenes due to insufficient and complicated lighting conditions. In this work, we tackle challenge by proposing a novel paradigm, i.e., disentangle then parse (DTP). DTP explicitly disentangles images into light-invariant reflectance light-specific illumination components recognizes semantics based on their adaptive fusion. Concretely, the proposed comprises two key...

10.1109/iccv51070.2023.01974 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

10.1016/j.bpsc.2017.08.003 article EN publisher-specific-oa Biological Psychiatry Cognitive Neuroscience and Neuroimaging 2017-08-24

GuessWhat?! is a visual dialog guessing game which incorporates Questioner agent that generates sequence of questions, while an Oracle answers the respective questions about target object in image. Based on this history between and Oracle, Guesser makes final guess object. While previous work has focused dialogue policy optimization visual-linguistic information fusion, most learns vision-linguistic encoding for three agents solely dataset without shared prior knowledge representation. To...

10.1109/cvpr46437.2021.00557 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Network interactions are likely to be instrumental in processes underlying rapid perception and cognition. Specifically, high-level perceptual regions must interact balance pre-existing models of the environment with new incoming stimuli. Simultaneous electroencephalography (EEG) fMRI (EEG/fMRI) enables temporal characterization brain-network combined improved anatomical localization regional activity. In this paper, we use simultaneous EEG/fMRI multivariate dynamical systems (MDS) analysis...

10.1523/jneurosci.1677-17.2017 article EN cc-by-nc-sa Journal of Neuroscience 2017-11-08

An avoidant grief style is marked by repeated and often unsuccessful attempts to prevent thinking about loss. Prior work shows involves monitoring the external environment in order avoid reminders of Here we sought determine whether grievers also monitor internal minimize conscious awareness loss-related thoughts. Individuals bereaved a first-degree relative, spouse or partner within last 14 months participated functional magnetic resonance imaging (fMRI) study (N = 29). We first applied...

10.1093/scan/nsy114 article EN cc-by Social Cognitive and Affective Neuroscience 2018-12-05

The hierarchical architecture of deep convolutional neural networks (CNN) resembles the multi-level processing stages human visual system during object recognition. Converging evidence suggests that this organization is key to CNN achieving human-level performance in categorization [22]. In paper, we leverage investigate spatiotemporal dynamics rapid brain. Specifically focus on perceptual decisions associated with different levels ambiguity. Using simultaneous EEG-fMRI, demonstrate temporal...

10.1109/cvprw.2018.00267 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

10.1016/j.bpsc.2018.08.003 article EN publisher-specific-oa Biological Psychiatry Cognitive Neuroscience and Neuroimaging 2018-08-25

We propose ImGeoNet, a multi-view image-based 3D object detection framework that models space by an image-induced geometry-aware voxel representation. Unlike previous methods which aggregate 2D features into voxels without considering geometry, ImGeoNet learns to induce geometry from images alleviate the confusion arising of free space, and during inference phase, only multiple views are required. Besides, powerful pre-trained feature extractor can be leveraged our representation, leading...

10.1109/iccv51070.2023.00644 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

The field of indoor monocular 3D object detection is gaining significant attention, fueled by the increasing demand in VR/AR and robotic applications. However, its advancement impeded limited availability diversity training data, owing to labor-intensive nature data collection annotation processes. In this paper, we present V-MIND (Versatile Monocular INdoor Detector), which enhances performance detectors across a diverse set classes harnessing publicly available large-scale 2D datasets. By...

10.48550/arxiv.2412.11412 preprint EN arXiv (Cornell University) 2024-12-15
Coming Soon ...