- Natural Language Processing Techniques
- EEG and Brain-Computer Interfaces
- Functional Brain Connectivity Studies
- Speech and dialogue systems
- Advanced Neuroimaging Techniques and Applications
- Topic Modeling
- Speech Recognition and Synthesis
- Human Pose and Action Recognition
- Neuroscience and Neural Engineering
- Multimodal Machine Learning Applications
- Fetal and Pediatric Neurological Disorders
- Video Analysis and Summarization
UNSW Sydney
2020-2025
Computational neuroimaging involves analyzing brain images or signals to provide mechanistic insights and predictive tools for human cognition behavior. While diffusion models have shown stability high-quality generation in natural images, there is increasing interest adapting them analyze data various neurological tasks such as enhancement, disease diagnosis decoding. This survey provides an overview of recent efforts integrate into computational neuroimaging. We begin by introducing the...
Abstract Objective. Functional magnetic resonance imaging (fMRI) is often modeled as networks of Regions Interest (ROIs) and their functional connectivity to study brain functions mental disorders. Limited fMRI data due high acquisition costs hampers recognition model performance. We aim address this issue using generative diffusion models for augmentation. Approach. propose Brain-Net-Diffusion, a transformer-based latent generate realistic augmenting datasets evaluate its impact on...
This paper demonstrates a Yoga assistant mobile application based on human-keypoints detection models, which imitates the scene that real tutors guide and supervise their students to do via video chat. In order provide humanize, safe convenient service, core function is designed as hands-free using voice embedding fast accurate models detect keypoints calculate scores. addition, we propose an improved algorithm scores can be applied all poses. Our evaluated different poses under scenes, its...
Electroencephalography (EEG) signals are gaining popularity in Brain-Computer Interface (BCI)-based rehabilitation and neural engineering applications thanks to their portability availability. Inevitably, the sensory electrodes on entire scalp would collect irrelevant particular BCI task, increasing risks of overfitting machine learning-based predictions. While this issue is being addressed by scaling up EEG datasets handcrafting complex predictive models, also leads increased computation...
Subject-independent Electroencephalography (EEG) recognition remains challenging due to inherent variability of brain anatomy across different subjects. Such is further complicated by the Volume Conduction Effect (VCE) that introduces channel-interference noise, exacerbating subject-specific biases in recorded EEG signals. Existing studies, often relying large datasets and entangled spatial-temporal features, struggle overcome this bias, particularly scenarios with limited data. To end, we...
This paper introduces StyleSpeech, a novel Text-to-Speech~(TTS) system that enhances the naturalness and accuracy of synthesized speech. Building upon existing TTS technologies, StyleSpeech incorporates unique Style Decorator structure enables deep learning models to simultaneously learn style phoneme features, improving adaptability efficiency through principles Lower Rank Adaptation~(LoRA). LoRA allows efficient adaptation features in pre-trained models. Additionally, we introduce...
Functional magnetic resonance imaging (fMRI) is an emerging neuroimaging modality that commonly modeled as networks of Regions Interest (ROIs) and their connections, named functional connectivity, for understanding the brain functions mental disorders. However, due to high cost fMRI data acquisition labeling, amount usually small, which largely limits performance recognition models. With rise generative models, especially diffusion ability generate realistic samples close real distribution...
Recent advancements in text-to-speech (TTS) systems, such as FastSpeech and StyleSpeech, have significantly improved speech generation quality. However, these models often rely on duration generated by external tools like the Montreal Forced Aligner, which can be time-consuming lack flexibility. The importance of accurate is underestimated, despite their crucial role achieving natural prosody intelligibility. To address limitations, we propose a novel Aligner-Guided Training Paradigm that...
Diffusion-based Generative AI gains significant attention for its superior performance over other generative techniques like Adversarial Networks and Variational Autoencoders. While it has achieved notable advancements in fields such as computer vision natural language processing, their application speech generation remains under-explored. Mainstream Text-to-Speech systems primarily map outputs to Mel-Spectrograms the spectral space, leading high computational loads due sparsity of MelSpecs....