- Speech and Audio Processing
- Neural dynamics and brain function
- Face Recognition and Perception
- Music and Audio Processing
- Advanced Image Processing Techniques
- Visual and Cognitive Learning Processes
- Image Processing Techniques and Applications
- Language, Metaphor, and Cognition
- Geographic Information Systems Studies
- Hearing Loss and Rehabilitation
- Visual Attention and Saliency Detection
- Spatial Cognition and Navigation
- Domain Adaptation and Few-Shot Learning
- Visual perception and processing mechanisms
- Image Enhancement Techniques
Western University
2020-2021
Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them visual domain. From having a assistant that could guide us through unfamiliar environments generative models produce images using only high-level text description, vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges need be addressed improve reliability those models. While language is discrete,...
While vision evokes a dense network of feedforward and feedback neural processes in the brain, visual are primarily modeled with hierarchical networks, leaving computational role poorly understood. Here, we developed generative autoencoder model adversarially trained it on categorically diverse data set images. We hypothesized that ventral pathway can be represented by reconstruction information performed model. compared representational similarity activity patterns proposed temporal...
Learning rich visual representations using contrastive self-supervised learning has been extremely successful. However, it is still a major question whether we could use similar approach to learn superior auditory representations. In this paper, expand on prior work (SimCLR) better We (1) introduce various data augmentations suitable for and evaluate their impact predictive performance, (2) show that training with time-frequency audio features substantially improves the quality of learned...
Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked the heavy burden implementing each protocol, bearing a non-trivial computational cost, making sense how all these benchmarks translate into meaningful axes progress. To facilitate systematic evaluation VLM progress, we introduce UniBench: unified implementation 50+ spanning comprehensive range carefully...
Abstract While vision evokes a dense network of feedforward and feedback neural processes in the brain, visual are primarily modeled with hierarchical networks, leaving computational role poorly understood. Here, we developed generative autoencoder model adversarially trained it on categorically diverse data set images. We hypothesized that ventral pathway can be represented by reconstruction information performed model. compared representational similarity activity patterns proposed...
In less than the blink of an eye, human brain processes visual sensory input, interprets scene, identifies faces, and recognizes objects. Decades neurophysiological studies have demonstrated that accomplishes these complicated tasks through a dense network feedforward feedback neural in ventral cortex. So far, are primarily modeled with hierarchical networks, computational role is poorly understood. this study, we developed generative autoencoder model adversarially trained it on large...
Contrastive learning of auditory and visual perception has been extremely successful when investigated individually. However, there are still major questions on how we could integrate principles learned from both domains to attain effective audiovisual representations. In this paper, present a contrastive framework learn representations unlabeled videos. The type strength augmentations utilized during self-supervised pre-training play crucial role for frameworks work sufficiently. Hence,...