Yutong Ban

ORCID: 0000-0001-5396-9251
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Surgical Simulation and Training
  • Speech and Audio Processing
  • Music and Audio Processing
  • Colorectal Cancer Screening and Detection
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Autonomous Vehicle Technology and Safety
  • UAV Applications and Optimization
  • Digital Imaging in Medicine
  • Multimodal Machine Learning Applications
  • Anatomy and Medical Technology
  • Medical Image Segmentation Techniques
  • Advanced Neural Network Applications
  • Indoor and Outdoor Localization Technologies
  • Neural dynamics and brain function
  • Cardiac, Anesthesia and Surgical Outcomes
  • Anomaly Detection Techniques and Applications
  • Colorectal Cancer Surgical Treatments
  • Artificial Intelligence in Healthcare and Education
  • Reinforcement Learning in Robotics
  • Advanced Adaptive Filtering Techniques
  • Machine Learning in Healthcare
  • Remote-Sensing Image Classification
  • Radiomics and Machine Learning in Medical Imaging

Shanghai Jiao Tong University
2024

Massachusetts General Hospital
2020-2023

Massachusetts Institute of Technology
2020-2023

Vassar College
2022

Institut national de recherche en informatique et en automatique
2017-2021

Intel (United States)
2021

K Lab (United States)
2021

Centre Inria de l'Université Grenoble Alpes
2016-2021

Université Grenoble Alpes
2016-2020

Laboratoire Jean Kuntzmann
2020

The recent trend in vision-based multi-object tracking (MOT) is heading towards leveraging the representational power of deep learning to jointly learn detect and track objects. However, existing methods train only certain sub-modules using loss functions that often do not correlate with established evaluation measures such as Multi-Object Tracking Accuracy (MOTA) Precision (MOTP). As these are differentiable, choice appropriate for end-to-end training still an open research problem. In this...

10.1109/cvpr42600.2020.00682 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Transformers have proven superior performance for a wide variety of tasks since they were introduced. In recent years, drawn attention from the vision community in such as image classification and object detection. Despite this wave, an accurate efficient multiple-object tracking (MOT) method based on transformers is yet to be designed. We argue that direct application transformer architecture with quadratic complexity insufficient noise-initialized sparse queries – not optimal MOT. propose...

10.1109/tpami.2022.3225078 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-11-28

Annotation of surgical video is important for establishing ground truth in data science endeavors that involve computer vision. With the growth field over last decade, several challenges have been identified annotating spatial, temporal, and clinical elements as well selecting annotators. In reviewing current challenges, we provide suggestions on opportunities improvement possible next steps to enable translation efforts analysis research practice.

10.1080/24699322.2021.1937320 article EN cc-by Computer Assisted Surgery 2021-01-01

In this article, we address the problem of tracking multiple speakers via fusion visual and auditory information. We propose to exploit complementary nature roles these two modalities in order accurately estimate smooth trajectories tracked persons, deal with partial or total absence one over short periods time, acoustic status-either speaking silent-of each person time. cast at hand into a generative audio-visual (or association) model formulated as latent-variable temporal graphical model....

10.1109/tpami.2019.2953020 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-11-20

Abstract Background Surgery generates a vast amount of data from each procedure. Particularly video provides significant value for surgical research, clinical outcome assessment, quality control, and education. The lifecycle is influenced by various factors, including structure, acquisition, storage, sharing; use exploration, finally governance, which encompasses all ethical legal regulations associated with the data. There universal need among stakeholders in science to establish...

10.1007/s00464-023-10288-3 article EN cc-by Surgical Endoscopy 2023-07-29

We address the problem of online localization and tracking multiple moving speakers in reverberant environments. This paper has following contributions. use direct-path relative transfer function (DP-RTF), an interchannel feature that encodes acoustic information robust against reverberation, we propose algorithm well suited for estimating DP-RTFs associated with audio sources. Another crucial ingredient proposed method is its ability to properly assign audio-source directions. Toward this...

10.1109/jstsp.2019.2903472 article EN IEEE Journal of Selected Topics in Signal Processing 2019-03-01

Analysis of relations between objects and comprehension abstract concepts in the surgical video is important AI-augmented surgery. However, building models that integrate our knowledge understanding surgery remains a challenging endeavor. In this paper, we propose novel way to conceptual into temporal analysis tasks using concept graph networks. proposed networks, incorporated notions, learning meaning as they apply data. We demonstrate results data for such verification critical view...

10.1109/tmi.2023.3299518 article EN IEEE Transactions on Medical Imaging 2023-07-27

Recently, deep reinforcement learning (DRL) has achieved promising results in solving online 3D Bin Packing Problems (3D-BPP). However, these DRL-based policies may perform poorly on new instances due to distribution shift. Besides generalization, we also consider adaptation, completely overlooked by previous work, which aims at rapidly finetuning a test distribution. To tackle both generalization and adaptation issues, propose Adaptive Selection After Pruning (ASAP), decomposes solver's...

10.48550/arxiv.2501.17377 preprint EN arXiv (Cornell University) 2025-01-28

Comprehension of surgical workflow is the foundation upon which artificial intelligence (AI) and machine learning (ML) holds potential to assist intraoperative decision making risk mitigation. In this work, we move beyond mere identification past phases, into prediction future steps specification transitions between them. We use a novel Generative Adversarial Network (GAN) formulation sample phases trajectories conditioned on video frames from laparoscopic cholecystectomy (LC) videos compare...

10.1109/lra.2022.3156856 article EN IEEE Robotics and Automation Letters 2022-03-07

This article proposes a deep neural network (DNN)-based direct-path relative transfer function (DP-RTF) enhancement method for robust direction of arrival (DOA) estimation in noisy and reverberant environments. The DP-RTF refers to the ratio between acoustic functions two microphone channels. First, complex-value is decomposed into inter-channel intensity difference, sinusoidal phase difference time-frequency domain. Then, features from series temporal context frames are utilized train DNN...

10.1049/cit2.12024 article EN cc-by-nc-nd CAAI Transactions on Intelligence Technology 2021-04-14

Abstract Background Surgical phase recognition using computer vision presents an essential requirement for artificial intelligence-assisted analysis of surgical workflow. Its performance is heavily dependent on large amounts annotated video data, which remain a limited resource, especially concerning highly specialized procedures. Knowledge transfer from common to more complex procedures can promote data efficiency. Phase models trained large, readily available datasets may be extrapolated...

10.1007/s00464-023-09971-2 article EN cc-by Surgical Endoscopy 2023-03-17

Multi-speaker tracking is a central problem in human-robot interaction. In this context, exploiting auditory and visual information gratifying challenging at the same time. Gratifying because complementary nature of allows us to be more robust against noise outliers than unimodal approaches. Challenging how properly fuse for multi-speaker far from being solved problem. paper we propose probabilistic generative model that tracks multiple speakers by jointly features their own representation...

10.1109/iccvw.2017.60 preprint EN 2017-10-01

In this paper, we comprehensively describe the methodology of our submissions to One-Minute Gradual-Emotion Behavior Challenge 2018.

10.48550/arxiv.1805.00638 preprint EN other-oa arXiv (Cornell University) 2018-01-01

The recent trend in vision-based multi-object tracking (MOT) is heading towards leveraging the representational power of deep learning to jointly learn detect and track objects. However, existing methods train only certain sub-modules using loss functions that often do not correlate with established evaluation measures such as Multi-Object Tracking Accuracy (MOTA) Precision (MOTP). As these are differentiable, choice appropriate for end-to-end training still an open research problem. In this...

10.48550/arxiv.1906.06618 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Analyzing surgical workflow is crucial for assistance robots to understand surgeries. With the understanding of complete workflow, are able assist surgeons in intra-operative events, such as by giving a warning when surgeon entering specific keys or high-risk phases. Deep learning techniques have recently been widely applied recognizing workflows. Many existing temporal neural network models limited their capability handle long-term dependencies data, instead, relying upon strong performance...

10.1109/icra48506.2021.9561770 article EN 2021-05-30

Multiple-speaker tracking is a crucial task for many applications. In real-world scenarios, exploiting the complementarity between auditory and visual data enables to track people outside field of view. However, practical methods must be robust changes in acoustic conditions, e.g. reverberation. We investigate how combine state-of-the-art audio-source localization techniques with Bayesian multi-person tracking. Our experiments demonstrate that performance proposed system not affected by environment.

10.1109/icassp.2018.8462100 preprint EN 2018-04-01

Multi-person tracking with a robotic platform is one of the cornerstones human-robot interaction. Challenges arise from occlusions, appearance changes and time-varying number people. Furthermore, final system constrained by hardware platform: low computational capacity limited field-of-view. In this paper, we propose novel method to simultaneously track persons in three-dimensions perform visual servoing. The complementary nature servoing enables to: (i) several while compensating for large...

10.1109/iros.2017.8206274 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017-09-01
Coming Soon ...