Xiaonan Qi

ORCID: 0000-0003-3822-6702
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Speech and Audio Processing
  • Video Analysis and Summarization
  • Multisensory perception and integration

Carnegie Mellon University
2023-2024

Chinese University of Hong Kong, Shenzhen
2024

Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in audio signals and spatial layouts of different objects visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while learning explicit semantically relevant frames sound images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph network (AGCN), for structure-aware...

10.1109/tim.2023.3260282 article EN IEEE Transactions on Instrumentation and Measurement 2023-01-01

Abstract Audio‐visual scene classification (AVSC) poses a formidable challenge owing to the intricate spatial‐temporal relationships exhibited by audio‐visual signals, coupled with complex spatial patterns of objects and textures found in visual images. The focus recent studies has predominantly revolved around extracting features from diverse neural network structures, inadvertently neglecting acquisition semantically meaningful regions crucial components within data. authors present...

10.1049/cit2.12375 article EN cc-by CAAI Transactions on Intelligence Technology 2024-11-26
Coming Soon ...