- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Stochastic processes and financial applications
- Advanced Bandit Algorithms Research
- Speech and dialogue systems
- Digital Media Forensic Detection
- Advanced Data Compression Techniques
- Natural Language Processing Techniques
- Advanced Computational Techniques and Applications
- Fault Detection and Control Systems
- Chaos-based Image/Signal Encryption
- Infant Health and Development
- Wireless Sensor Networks and IoT
- Anomaly Detection Techniques and Applications
- Image and Signal Denoising Methods
- Digital and Cyber Forensics
- COVID-19 diagnosis using AI
- Advanced Algorithms and Applications
- Network Security and Intrusion Detection
- Advanced Malware Detection Techniques
- Reinforcement Learning in Robotics
- Phonetics and Phonology Research
- Auction Theory and Applications
- IoT-based Smart Home Systems
Computer Research Institute of Montréal
2021-2025
Over the recent years, various self-supervised embedding learning methods for deep speaker verification were proposed. The performance of framework highly depends on data augmentation technique, but due to sensitive nature information within speech signal, most training relies simple augmentations such as additive noise or simulated reverberation. Thus while conventional systems can yield minimum within-utterance variability, their capability generalize out-of-set utterance is limited. In...
One of the most widely used self-supervised (SS) speaker verification (SV) system training methods is to optimize embedding network in a discriminative fashion using clustering algorithm (CA)-driven Pseudo-Labels (PLs). Although PL-based SS scheme showed impressive performance, recent studies have shown that label noise can significantly impact performance. In this paper, we explored various PLs driven by different CAs and conducted fine-grained analysis relationship between quality SV...
This work focuses on the problem of detecting fake audio clips. To improve current spoofing detection models, we propose a selection multiple augmentations spe-cially designed to resemble attacks. These are experimentally found be very useful and using them achieves notable performance 2.8% EER ASVspoof 2019 challenge evaluation set. Unlike widely employed acoustic features, in this paper explore use Mel-spectrogram image features employ vari-ous codecs achieve robustness codec transmission...
Extraction of a speaker embedding vector plays an important role in deep learning-based verification. In this contribution, to extract discriminant utterance level embeddings, we propose hybrid neural network that employs both cross- and self-module attention pooling mechanisms. More specifically, the proposed system incorporates 2D-Convolution Neural Network (CNN)-based feature extraction module cascade with frame-level network, which is composed fully Time Delay (TDNN) TDNN-Long Short Term...
In this paper, we propose a new hybrid system for extracting speaker embedding vector. More specifically, the proposed employs multi-level global-local statistics pooling method in order to aggregate information within short time-span and utterance-level context. evaluate system, set of experiments on NIST SRE 2016, Short-duration verification (SdSV) Challenge 2021, VoxCeleb datasets were conducted, network was able outperform conventional approaches trained same dataset. Moreover, our...
Clustering (CL)-based pseudo-labels (PLs) are widely used to optimize speaker embedding (SE) networks and train self-supervised (SS) verification (SV) systems. However, PL-based SS training depends on high-quality PLs. In this paper, we propose a general-purpose CL algorithm called CAMSAT that outperforms all other baselines cluster SEs. Moreover, using the generated PLs our SE system allows us further improve SV performance. is based two principles: (1) mixing predictions of augmented...
Optimal stopping is the problem of deciding right time at which to take a particular action in stochastic system, order maximize an expected reward. It has many applications areas such as finance, healthcare, and statistics. In this paper, we employ deep Reinforcement Learning (RL) learn optimal policies two financial engineering applications: namely option pricing, exercise. We present for first comprehensive empirical evaluation quality identified by three state art RL algorithms: double...
Over the recent years, various self-supervised contrastive embedding learning methods for deep speaker verification were proposed. The performance of framework highly depends on data augmentation technique, but due to sensitive nature information within speech signal, most training relies simple augmentations such as additive noise or simulated reverberation. Thus while conventional systems can yield minimum within-utterance variability, capability generalize out-of-set utterance is limited....
Clustering is an unsupervised learning technique, which leverages a large amount of unlabeled data to learn cluster-wise representations from speech. One the most popular self-supervised techniques train speaker verification system predict pseudo-labels using clustering algorithms and then embedding net-work generated in discriminative manner. Therefore, - driven systems' performance relies heavily on accuracy adopted algorithms. In this contribution, we propose novel technique that not only...
Clustering-based Pseudo-Labels (PLs) are widely used to optimize Speaker Embedding (SE) networks and train Self-Supervised (SS) Verification (SV) systems. However, this SS training scheme relies on highly accurate PLs. In paper, we perform a large investigative study of the effect several regularization techniques (mixup, label smoothing, employing sub-centers) noise robustness SSSV We these apply them various recent metric learning loss functions for better generalization particular,...
In deep learning-based speaker verification frameworks, extraction of a embedding vector plays key role. this contribution, we propose hybrid neural network that employs cross-module attention pooling mechanism for the discriminant utterance-level embeddings. particular, proposed system incorporates 2D-Convolution Neural Network (CNN)-based feature module in cascade with frame-level network, which is composed fully Time Delay (TDNN) and TDNN-Long Short Term Memory (TDNN-LSTM) parallel...
Optimizing a speaker embedding network in discriminative fashion using clustering algorithm-driven pseudo-labels is one of the most widely used self-supervised verification system training schemes. Although this kind supervised scheme showed impressive performance, recent studies have shown that label noise can significantly impact performance. In contribution, we explore various algorithms to generate and conduct fine-grained analysis on relationship between quality Through experimental...