- Music and Audio Processing
- Speech and Audio Processing
- Video Analysis and Summarization
- Multisensory perception and integration
Carnegie Mellon University
2023-2024
Chinese University of Hong Kong, Shenzhen
2024
Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in audio signals and spatial layouts of different objects visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while learning explicit semantically relevant frames sound images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph network (AGCN), for structure-aware...
Abstract Audio‐visual scene classification (AVSC) poses a formidable challenge owing to the intricate spatial‐temporal relationships exhibited by audio‐visual signals, coupled with complex spatial patterns of objects and textures found in visual images. The focus recent studies has predominantly revolved around extracting features from diverse neural network structures, inadvertently neglecting acquisition semantically meaningful regions crucial components within data. authors present...