- Speech and Audio Processing
- Speech Recognition and Synthesis
- Music and Audio Processing
- Music Technology and Sound Studies
- Advanced Wireless Communication Techniques
- Phonetics and Phonology Research
- Voice and Speech Disorders
- Advanced Data Compression Techniques
- Advanced MIMO Systems Optimization
- Embedded Systems Design Techniques
- Natural Language Processing Techniques
- Hearing Loss and Rehabilitation
- Respiratory and Cough-Related Research
- Phonocardiography and Auscultation Techniques
- Cooperative Communication and Network Coding
- Parallel Computing and Optimization Techniques
- Advanced Image and Video Retrieval Techniques
- Spacecraft Design and Technology
- Advanced Adaptive Filtering Techniques
- Vagus Nerve Stimulation Research
- Wireless Communication Networks Research
- EEG and Brain-Computer Interfaces
- Advanced Wireless Network Optimization
- Diverse Musicological Studies
- Interconnection Networks and Systems
Singapore Institute of Technology
2020-2025
University of Science and Technology of China
2014-2024
Atlantic Technological University
2023-2024
Logan Hospital
2024
Technological University Dublin
2023-2024
Dr. A.P.J. Abdul Kalam Technical University
2022
University of Kent
2015-2020
Medway School of Pharmacy
2015-2020
Ipswich Hospital
2018-2020
Institute of Engineering
2017
The automatic recognition of sound events by computers is an important aspect emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in learning, well computational models the human system, have contributed to this increasingly popular research field. Robust event classification, ability recognise sounds under real-world noisy conditions, especially challenging task. Classification methods translated from speech domain, using...
Traditional sound event recognition methods based on informative front end features such as MFCC, with back sequencing HMM, tend to perform poorly in the presence of interfering acoustic noise. Since noise corruption may be unavoidable practical situations, it is important develop more robust and classifiers. Recent advances this field use powerful machine learning techniques high dimensional input spectrograms or auditory image. These improve robustness largely thanks discriminative...
This paper proposes an attention pooling based representation learning method for speech emotion recognition (SER).The emotional is learned in end-to-end fashion by applying a deep convolutional neural network (CNN) directly to spectrograms extracted from utterances.Motivated the success of GoogleNet, two groups filters with different shapes are designed capture both temporal and frequency domain context information input spectrogram.The features concatenated fed into subsequent layers.To...
Background: Despite recent significant progress in the development of automatic sleep staging methods, building a good model still remains big challenge for studies with small cohort due to data-variability and data-inefficiency issues. This work presents deep transfer learning approach overcome these issues enable transferring knowledge from large dataset staging. Methods: We start generic end-to-end framework sequence-to-sequence derive two networks as means learning. The are first trained...
This paper presents and explores a robust deep learning framework for auscultation analysis. aims to classify anomalies in respiratory cycles detect diseases, from sound recordings. The begins with front-end feature extraction that transforms input into spectrogram representation. Then, back-end network is used the features categories of anomaly or diseases. Experiments, conducted over ICBHI benchmark dataset sounds, confirm three main contributions towards respiratory-sound Firstly, we...
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing enhancement GANs (SEGAN) make use of a single generator perform one-stage mapping. In this work, we propose multiple generators that are chained multi-stage mapping, which gradually refines the noisy input signals in stage-wise fashion. Furthermore, study two scenarios: (1) share their parameters and (2) generators' independent. The former constrains...
We present a new image quality assessment (IQA) algorithm based on the phase and magnitude of 2D (twodimensional) Discrete Fourier Transform (DFT). The basic idea is to compare reference distorted images compute score. However, it well known that Human Visual Systems (HVSs) sensitivity different frequency components not same. accommodate this fact via simple yet effective strategy nonuniform binning components. This process also leads reduced space representation thereby enabling...
This paper presents a robust deep learning framework developed to detect respiratory diseases from recordings of sounds. The complete detection process firstly involves front end feature extraction where are transformed into spectrograms that convey both spectral and temporal information. Then back-end model classifies the features classes disease or anomaly. Experiments, conducted over ICBHI benchmark dataset sounds, evaluate ability classify Two main contributions made in this paper....
Whispered speech can be useful for quiet and private communication, is the primary means of unaided spoken communication many people experiencing voice-box deficiencies. Patients who have undergone partial or full laryngectomy are typically unable to speak anything more than hoarse whispers, without aid prostheses specialized speaking techniques. Each current rehabilitative methods post-laryngectomized patients (primarily oesophageal speech, tracheo-esophageal puncture, electrolarynx)...
A key problem in spoken language identification (LID) is to design effective representations which are specific information. For example, recent years, based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances machine learning led significant improvements, LID performance still lacking, especially short duration speech utterances. With the hypothesis that information weak represented only latently speech, largely dependent statistical...
Transformers have recently dominated the ASR field.Although able to yield good performance, they involve an autoregressive (AR) decoder generate tokens one by one, which is computationally inefficient.To speed up inference, non-autoregressive (NAR) methods, e.g.single-step NAR, were designed, enable parallel generation.However, due independence assumption within output tokens, performance of single-step NAR inferior that AR models, especially with a largescale corpus.There are two challenges...
Artificial Intelligence (AI) often misinterprets or inadequately serves blind individuals, leading to accessibility challenges and systemic exclusion. While prior research examines how users verify contest AI errors, no structured methodology exists empower them in reshaping AI’s role their lives. This paper introduces a novel, user-driven that enables individuals systematically identify, challenge, refine outputs while harnessing generative for greater inclusion well-being. Our approach...
Sustainable and practical personal mobility solutions for campus environments have traditionally revolved around the use of bicycles, or provision pedestrian facilities. However many also experience traffic congestion, parking difficulties pollution from fossil-fuelled vehicles. It appears that pedal power alone has not been sufficient to supplant petrol diesel vehicles date, therefore it is opportune investigate both reasons behind continual environmentally unfriendly transport, consider...
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with convolutional and deconvolutional layers of GAN (SEGAN) using raw signal Further, empirically study effect placing at (de)convolutional varying indices as well all them when memory allows. Our experiments show that...