Yitong Li

ORCID: 0009-0009-3874-6055
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Robotics and Sensor-Based Localization
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Remote-Sensing Image Classification
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Software Engineering Research
  • Medical Imaging Techniques and Applications
  • Brain Tumor Detection and Classification
  • Bone and Joint Diseases
  • Topic Modeling
  • Advanced MRI Techniques and Applications
  • Optical measurement and interference techniques
  • Radiomics and Machine Learning in Medical Imaging
  • Infrastructure Maintenance and Monitoring
  • Software Testing and Debugging Techniques
  • Machine Learning in Healthcare
  • BIM and Construction Integration
  • Spine and Intervertebral Disc Pathology
  • Business Process Modeling and Analysis
  • Data Quality and Management
  • MRI in cancer diagnosis
  • Human Motion and Animation

Huazhong University of Science and Technology
2019-2025

Tongji Hospital
2019-2025

Chinese Academy of Medical Sciences & Peking Union Medical College
2025

State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing
2025

Wuhan University
2025

Technical University of Munich
2022-2025

Binzhou University
2025

Binzhou Medical University
2025

Beijing University of Technology
2024

Dalian Maritime University
2024

In this work, we propose a new task called Story Visualization. Given multi-sentence paragraph, the story is visualized by generating sequence of images, one for each sentence. contrast to video generation, visualization focuses less on continuity in generated images (frames), but more global consistency across dynamic scenes and characters -- challenge that has not been addressed any single-image or generation methods. Therefore, story-to-image-sequence model, StoryGAN, based sequential...

10.1109/cvpr.2019.00649 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Event-specific concepts are the semantic specifically designed for events of interest, which can be used as a mid-level representation complex in videos. Existing methods only focus on defining event-specific small number pre-defined events, but cannot handle novel unseen events. This motivates us to build large scale concept library that covers many real-world and their possible. Specifically, we choose WikiHow, an online forum containing how-to articles human daily life We perform...

10.1145/2733373.2806221 article EN 2015-10-13

Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training conditional model extract both static and dynamic information text. This is manifested in hybrid framework, employing Variational Autoencoder (VAE) Generative Adversarial Network (GAN). The features, called "gist," are used sketch text-conditioned background color object layout structure. Dynamic features considered transforming input into an image filter....

10.1609/aaai.v32i1.12233 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-27

Input constraints are useful for many software development tasks. For example, input of a function enable the generation valid inputs, i.e., inputs that follow these constraints, to test deeper. API functions deep learning (DL) libraries have DL specific which described informally in free form documentation. Existing constraint extraction techniques ineffective extracting constraints. To fill this gap, we design and implement new technique, DocTer, analyze documentation extract functions....

10.1145/3533767.3534220 preprint EN 2022-07-15

10.1109/wacv61041.2025.00021 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Most existing text-to-image synthesis tasks are static single-turn generation, based on pre-defined textual descriptions of images. To explore more practical and interactive real-life applications, we introduce a new task - Interactive Image Editing, where users can guide an agent to edit images via multi-turn commands on-the-fly. In each session, the takes natural language description from user as input, modifies image generated in previous turn design, following description. The main...

10.1145/3394171.3413551 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Object pose estimation is crucial for robotic applications and augmented reality. Beyond instance level 6D object methods, estimating category-level shape has become a promising trend. As such, new research field needs to be supported by well-designed datasets. To provide benchmark with high-quality ground truth annotations the community, we introduce multimodal dataset photometrically challenging objects termed PhoCaL. PhoCaL comprises 60 high quality 3D models of household over 8...

10.1109/cvpr52688.2022.02054 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

10.1016/j.aei.2024.102427 article EN publisher-specific-oa Advanced Engineering Informatics 2024-02-27

Despite recent advances in medical image generation, existing methods struggle to produce anatomically plausible 3D structures. In synthetic brain magnetic resonance images (MRIs), characteristic fissures are often missing, and reconstructed cortical surfaces appear scattered rather than densely convoluted. To address this issue, we introduce Cor2Vox, the first diffusion model-based method that translates continuous shape priors MRIs. achieve this, leverage a Brownian bridge process which...

10.48550/arxiv.2502.12742 preprint EN arXiv (Cornell University) 2025-02-18

Learning-based methods to solve dense 3D vision problems typically train on sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are not compared nor discussed in the literature due a lack multi-modal datasets. Texture-less regions problematic for structure from motion stereo, reflective material poses issues active sensing, translucent objects intricate measure with existing hardware. Training inaccurate or corrupt data induces model...

10.1109/cvpr52729.2023.00082 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Abstract The advantages of microstrip patch antennas include small size, adaptable surface, ease fabrication, and compatibility with integrated circuit technology. Numerous experiments have been done over the past few decades to enhance performance this antenna, both military commercial sectors found many uses for it. This paper introduces a antenna an operating frequency 28GHz 5G mobile communication. research designed simulated rectangular 3.494 mm * 5.3 0.003 mm. proposed resonates at 28...

10.1088/1742-6596/2580/1/012063 article EN Journal of Physics Conference Series 2023-09-01

Event-specific concepts are the semantic designed for events of interest, which can be used as a mid-level representation complex in videos. Existing methods only focus on defining event-specific small number predefined events, but cannot handle novel unseen events. This motivates us to build large scale concept library that covers many real-world and their possible. Specifically, we choose WikiHow, an online forum containing how-to articles human daily life We perform coarse-to-fine event...

10.48550/arxiv.1506.02328 preprint EN other-oa arXiv (Cornell University) 2015-01-01
Coming Soon ...