- Speech Recognition and Synthesis
- Speech and Audio Processing
- Music and Audio Processing
- Natural Language Processing Techniques
- Phonetics and Phonology Research
- Advanced Data Compression Techniques
- Speech and dialogue systems
- Hearing Loss and Rehabilitation
- Advanced Adaptive Filtering Techniques
- Topic Modeling
- Voice and Speech Disorders
- Multi-Agent Systems and Negotiation
- Algorithms and Data Compression
- Acoustic Wave Phenomena Research
- Anomaly Detection Techniques and Applications
- Neural Networks and Applications
- Image and Signal Denoising Methods
- Subtitles and Audiovisual Media
- Video Analysis and Summarization
- Time Series Analysis and Forecasting
- Digital Filter Design and Implementation
- Evolutionary Algorithms and Applications
- Integrated Circuits and Semiconductor Failure Analysis
- Electronic Packaging and Soldering Technologies
- Food Supply Chain Traceability
Norwegian University of Science and Technology
2011-2024
SINTEF
2022
Acoustics (Norway)
2022
Università degli Studi di Enna Kore
2021
Georgia Institute of Technology
2021
NTNU Samfunnsforskning
2021
AT&T (United States)
2005
Roskilde University
2002
For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to whole-word-unit-based approach. preparing inventory of subword units, an automatic segmentation preferrable manual as it substantially reduces work associated with generation templates gives more consistent results. In this paper we discuss some methods for automatically segmenting into phonetic units. Three different approaches are described, one based on template matching,...
A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount material be collected and used to train set language-specific acoustic phone models. However, designing good ASR systems with little or no data resource-limited is still challenging research topic. As consequence, there has been an increasing in exploring knowledge sharing among number so that universal units defined work multiple even all...
In recent research, we have proposed a high-accuracy bottom-up detection-based paradigm for continuous phone speech recognition. The key component of our system was bank articulatory detectors each which computes score describing an activation level the specified phonetic features that current frame exhibits. this work, present first attempt at designing universal recognizer using approach. We show technique is intrinsically language independent since reliable can be designed diverse...
This paper addresses a novel technique for representation and processing of n-gram counts in phonotactic language recognition (LRE): subspace multinomial modelling represents the vectors by low dimensional coordinates total variability subspace, called iVector. Two techniques iVector scoring are tested: support vector machines (SVM), logistic regression (LR). Using standard NIST LRE 2009 task as our evaluation set, latter approach was shown to outperform system based on direct SVM...
Speaking is a fundamental way of communication, developed at young age.Unfortunately, some children with speech sound disorder struggle to acquire this skill, hindering their ability communicate efficiently.Speech therapies, which could aid these in acquisition, greatly rely on practice trials and accurate feedback about pronunciations.To enable home therapy lessen the burden speech-language pathologists, we need highly automatic assessing quality uttered by children.Our work focuses...
This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) enhancement (SE) methods. To this end, we first fine-tune several foundational-based models on the standard PC-GITA (s-PC-GITA) clean data. Our results demonstrate superior performance to previously proposed models. Second, assess generalization capability of PD extended (e-PC-GITA) recordings, collected operative conditions,...
Computer-assisted Language Learning (CALL) is a rapidly developing area accelerated by advancements in the field of AI. A well-designed and reliable CALL system allows students to practice language skills, like pronunciation, any time outside classroom. Furthermore, gamification via mobile applications has shown encouraging results on learning outcomes motivates young users more perceive as positive experience. In this work, we adapt latest speech recognition technology be part an online...
We present a novel approach to designing bottom-up automatic speech recognition (ASR) systems. The key component of the proposed is bank articulatory attribute detectors implemented using set feed-forward artificial neural networks (ANNs). Each detector computes score describing an activation level specified attributes that current frame exhibits. These cues are first combined by event merger provides some evidence about presence higher feature which then verified verifier produce hypotheses...
We propose a novel universal acoustic characterization approach to spoken language identification (LID), in which any is described with common set of fundamental units defined “universally.” Specifically, manner and place articulation form this unit inventory are used build attribute models data-driven techniques. Using the vector space modeling approaches LID utterance first decoded into sequence attributes. Then, feature consisting co-occurrence statistics created, final decision...
The authors describe a system for speaker-dependent speech recognition based on acoustic subword units. Several strategies automatic generation of an lexicon are outlined. Preliminary tests have been performed small vocabulary. In these tests, the proposed showed results comparable to those whole-word-based systems.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
A novel bottom-up decoding framework for large vocabulary continuous speech recognition (LVCSR) with a modular search strategy is presented. Weighted finite state machines (WFSMs) are utilized to accomplish stage-by-stage acoustic-to-linguistic mappings from low-level attributes high-level linguistic units in manner. Probabilistic attribute and phone lattices used as intermediate vehicles facilitate knowledge integration at different levels of the hierarchy. The final decoded sentence...
We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within deep neural network (DNN) framework. In contrast with recent results literature, we argue that a DNN vector-to-vector regression front-end for speech enhancement (DNN-SE) can play key role AAI when used to enhance spectral features prior back-end processing. experimented single- and multi-task training strategies DNN-SE block finding latter be beneficial AAI. Furthermore,...
Data augmentation is a technique which enhances the size and quality of training data such that deep learning or machine models can achieve better performance. This paper proposes novel way applying for child speech recognition in low resource scenario. achieved by modifying existing adult signals. The procedure consists two main parts, resampling, time scaling. experiment involves both from children aged kindergarten to grade 10, adults’ speech. We test proposed method using TDNN-HMM...
Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation performance these SE is uncommon. Instead, objective measures (OIMs) are generally used predict subjective increases. Many recent deep learning (DL) based systems, expected as measured by OIMs. However, validation ability OIMs when enhancing a signal using DL-based lacking. Therefore, in this study, we evaluate predictive five...
We present an architecture and VLSI implementation of the computations Gaussian observation probabilities in HMM based speech recognition. As opposed to previous work Sagayama Takahashi (see IEEE International Conf. on Acoustics, Speech Signal Proc., vol.1, p.213-16, 1995), reducing number arithmetic operations is not major concern when these are implemented a standard CMOS process. Instead, memory bandwidth limiting factor. introduce variant fix-point representation, called dynamical...
A major challenge in speech recognition is creating a lexicon which robust to inter and intra speaker variations. This even more so recognisers based on non linguistic units, e.g., acoustic subword units (ASWUs), since no standard pronunciation dictionaries are available. Thus the baseforms describing vocabulary words terms of need be generated from training data. We propose an algorithm for ASWU performs combined optimisation models. The resulting system has been tested DARPA Resource...
Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless fusion approach, it assumed that different systems may contribute complementary information, either because they are developed on datasets, or use features modeling approaches. Most authors apply as a final resource for improving based an existing set Though relative gains decrease larger sets considered, best usually attained all available systems, which lead to high...
Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of unreliable hypotheses are pruned during recognition process, current state-of-the-art often make errors that ldquounreasonablerdquo for human listeners. Several studies have shown a proper integration acoustic-phonetic can be beneficial to reducing errors. We previously high-accuracy phone achieved if bank...