- Music and Audio Processing
- Speech and Audio Processing
- Voice and Speech Disorders
- Speech Recognition and Synthesis
- Dysphagia Assessment and Management
- Respiratory and Cough-Related Research
- Neurological disorders and treatments
- COVID-19 diagnosis using AI
- Emotion and Mood Recognition
- Parkinson's Disease Mechanisms and Treatments
- Tracheal and airway disorders
- Botulinum Toxin and Related Neurological Disorders
- Multisensory perception and integration
- Gastroesophageal reflux and treatments
- Diverse Musicological Studies
- Scientific Research Methodologies and Applications
- Music Technology and Sound Studies
- Information Systems and Technology Applications
- Neuroscience and Music Perception
- Genetic Neurodegenerative Diseases
- Phonocardiography and Auscultation Techniques
- Digital Filter Design and Implementation
- Advanced Research in Systems and Signal Processing
- Blind Source Separation Techniques
- Advanced Chemical Sensor Technologies
University of Rome Tor Vergata
2021-2025
Parkinson’s Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis achieved clinically on basis different symptoms with considerable delays from onset processes in central nervous system. In this study, we investigated early and full-blown PD patients based analysis their voice characteristics aid commonly employed machine learning (ML) techniques. A custom dataset was made hi-fi quality recordings vocal tasks gathered Italian healthy control subjects...
Machine Learning (ML) algorithms within a human–computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to feasibility and characteristics cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naïve Bayes MLP) applied acoustic features, obtained through procedure based on Kononenko’s discretization correlation-based feature selection. The system encompasses five emotions (disgust, fear,...
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally different methodologies such as Deep Learning or "traditional" Machine (ML). In this paper, we compared and explored the two on DEMoS dataset consisting of 8869 audio files 58 speakers emotional states. A custom CNN to several pre-trained nets using image inputs spectrograms Cepstral-temporal (MFCC) graphs. AML approach based acoustic feature extraction, selection multi-class classification by means...
This paper addresses the issue of distinguishing commercially played songs from non-music audio in radio broadcasts, where automatic song identification systems are commonly employed for reporting purposes. Service call costs increase because these need to remain continuously active, even when music is not being broadcast. Our solution serves as a preliminary filter determine whether an segment constitutes “music” and thus warrants subsequent service identifier. We collected 139 h...
Parkinson’s Disease and Adductor-type Spasmodic Dysphonia are two neurological disorders that greatly decrease the quality of life millions patients worldwide. Despite this great diffusion, related diagnoses often performed empirically, while it could be relevant to count on objective measurable biomarkers, among which researchers have been considering features voice impairment can useful indicators but sometimes lead confusion. Therefore, here, our purpose was aimed at developing a robust...
The growth in computing capabilities has significantly transformed the realm of data analysis and processing, most notably through widespread adoption artificial intelligence (AI) deep learning technologies [...]
Reverberation and background noise are common unavoidable real-world phenomena that hinder automatic speaker recognition systems, particularly because these systems typically trained on noise-free data. Most models rely fixed audio feature sets. To evaluate the dependency of features reverberation noise, this study proposes augmenting commonly used mel-frequency cepstral coefficients (MFCCs) with relative spectral (RASTA) features. The performance was assessed using noisy data generated by...
Automatic assessment of speech disorders is a cutting-edge topic in vocal analysis. Recent studies indicated possible connections between eating and voice alterations. In this work, we assessed the influence obesity Gastro- Esophageal Reflux Disease (GERD) on voice, being former risk factor for latter. Moreover, investigated mutual diseases working with consistent set features. To these aims, used tests from 92 subjects, consisting vowel phonation sentence repetition, subjects including...
A Machine-Learning process for selecting optimal biomarkers that identify Dysphagia is presented. The effectiveness of said confirmed by an ensemble Classifiers correctly distinguish between Healthy and Dysphagic patients with high Accuracy. An overview the clinical meaning found presented in Discussion, corroborating further refining previous studies matter. RASTA Processing speech spectral energy distribution are main domains detecting voice.
This paper presents a novel, high-speed, and low-complexity algorithm for pitch (F0) detection, along with new dataset testing comparison of some the most effective existing techniques. The algorithm, called OneBitPitch (OBP), is based on modified autocorrelation function applied to single-bit signal fast computation. focus explicitly speed real-time detection applications in detection. A procedure proposed using proprietary synthetic (SYNTHPITCH) against three widely used algorithms: YIN,...
Deep brain stimulation of the subthalamic nucleus (STN-DBS) can exert relevant effects on voice patients with Parkinson's disease (PD). In this study, we used artificial intelligence to objectively analyze voices PD STN-DBS.In a cross-sectional enrolled 108 controls and 101 PD. The cohort was divided into two groups: first group included 50 STN-DBS, second 51 receiving best medical treatment. were clinically evaluated using Unified Disease Rating Scale part-III subitem for (UPDRS-III-v). We...
Automatic identification of speakers from text-independent information is a task required in broad base applications. Gaussian Mixture Models are state-of-the-art solution to the task. We apply this method speech dataset, and present novel using Nonnegative Matrix Factorization sparseness constraints. Results show that our scores on par with state art, even without optimization, while also providing architecture-related advantages. provide comparison results for two methods.
Parkinson's disease (PD) is one of the most widespread neurodegenerative diseases worldwide, affected by a number alterations, among which speech impairments that, interestingly, manifests up to 10 years before other major evidences (e.g. motor impairments). In this regard, we investigated feasibility model based on temporal evolution attractors in reconstructed phase space identify hallmarks PD identification and progression. To end, adopted dataset was made vocal emissions 46 de-novo 54...
This paper deals with the automatic detection of Myotonia from a task based on sudden opening hand. Data have been gathered 44 subjects, divided into 17 controls and 27 myotonic patients, by measuring 2-point articulation each finger thanks to calibrated sensory glove equipped Resistive Flex Sensor (RFS). RFS gloves are proven be reliable in analysis motion for which is relevant monitoring disease subsequent treatment. With focus healthy VS pathological comparison, customized features were...