Yiyang Jiang

ORCID: 0009-0007-1169-4465
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Natural Language Processing Techniques
  • Neural dynamics and brain function
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image and Video Retrieval Techniques
  • Functional Brain Connectivity Studies
  • Video Analysis and Summarization
  • COVID-19 diagnosis using AI
  • Computational and Text Analysis Methods
  • Model Reduction and Neural Networks
  • Machine Learning in Healthcare
  • EEG and Brain-Computer Interfaces

Hong Kong Polytechnic University
2024

University of Chinese Academy of Sciences
2024

Shanghai Artificial Intelligence Laboratory
2024

Catholic University of America
1998

National Institute of Mental Health
1998

Diffusion models represent the latest state-of-the-art in domain of deep generative models, boasting remarkable performance across a broad spectrum applications. Despite widespread success diffusion various tasks, original formulations these exhibit notable limitations. The article uses DDPM as an example, thoroughly and deeply exploring deriving mathematical principles model from two different perspectives. Additionally, this explores relationship between five other types models: Generative...

10.54097/sxd49274 article EN cc-by-nc Highlights in Science Engineering and Technology 2024-08-15

Video Corpus Visual Answer Localization (VCVAL) includes question-related video retrieval and visual answer localization in the videos. Specifically, we use text-to-text to find relevant videos for a medical question based on similarity of transcript answers generated by GPT4. For localization, start end timestamps are predicted alignments both content subtitles with queries. Query-Focused Instructional Step Captioning (QFISC) task, step captions provide LLaVA-Next-Video model as context,...

10.48550/arxiv.2412.15514 preprint EN arXiv (Cornell University) 2024-12-19

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment deep interaction between speech text features. To address SQA challenge on LLMs, we initially curated free-form open-ended LibriSQA dataset from Librispeech, comprising Part I with natural conversational...

10.48550/arxiv.2308.10390 preprint EN cc-by arXiv (Cornell University) 2023-01-01
Coming Soon ...