Martti Vainio

ORCID: 0000-0003-2570-0196
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Phonetics and Phonology Research
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Natural Language Processing Techniques
  • Speech and dialogue systems
  • Multisensory perception and integration
  • Hearing Impairment and Communication
  • Neuroscience and Music Perception
  • Hearing Loss and Rehabilitation
  • Action Observation and Synchronization
  • Language, Metaphor, and Cognition
  • Music and Audio Processing
  • Linguistic Variation and Morphology
  • Voice and Speech Disorders
  • Linguistics and language evolution
  • Acoustic Wave Phenomena Research
  • Neurobiology of Language and Bilingualism
  • Motor Control and Adaptation
  • Syntax, Semantics, Linguistic Variation
  • Linguistic research and analysis
  • Autism Spectrum Disorder Research
  • Linguistics, Language Diversity, and Identity
  • Music Technology and Sound Studies
  • Categorization, perception, and language
  • Research in Social Sciences

University of Helsinki
2016-2025

Digital Science (United States)
2019

University of Turku
2007

Stockholm South General Hospital
1994

This paper describes an hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed method, is first decomposed into source signal and of vocal tract filter through filtering, thus parametrized excitation spectral features. The features are modeled individually in framework HMM generated synthesis stage according to text input. synthesized interpolating concatenating flow pulses, further modified...

10.1109/tasl.2010.2045239 article EN IEEE Transactions on Audio Speech and Language Processing 2010-03-12

Disorders of music and speech perception, known as amusia aphasia, have traditionally been regarded dissociated deficits based on studies brain damaged patients. This has taken evidence that are perceived by largely separate independent networks in the brain. However, recent congenital broadened this view showing deficit is associated with problems perceiving prosody, especially intonation emotional prosody. In present study association between perception prosody was investigated healthy...

10.3389/fpsyg.2013.00566 article EN cc-by Frontiers in Psychology 2013-01-01

Objective: To study prosodic perception in early-implanted children relation to auditory discrimination, working memory, and exposure music. Design: Word sentence stress perception, discrimination of fundamental frequency (F0), intensity duration, forward digit span were measured twice over approximately 16 months. Musical activities assessed by questionnaire. Study sample: Twenty-one age-matched normal-hearing (NH) (4–13 years). Results: Children with cochlear implants (CIs) exposed music...

10.3109/14992027.2013.872302 article EN International Journal of Audiology 2014-01-27

All-pole modeling is a widely used formant estimation method, but its performance known to deteriorate for high-pitched voices. In order address this problem, several all-pole methods robust fundamental frequency have been proposed. This study compares five such previously and introduces technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of error multiplied by given parametric...

10.1121/1.4812756 article EN The Journal of the Acoustical Society of America 2013-08-01

A unique feature of human communication system is our ability to rapidly acquire new words and build large vocabularies. However, its neurobiological foundations remain largely unknown. In an electrophysiological study optimally designed probe this rapid formation word memory circuits, we employed acoustically controlled novel word-forms incorporating native non-native speech sounds, while manipulating the subjects' attention on input. We found a robust index neurolexical memory-trace...

10.1016/j.neuroimage.2015.05.098 article EN cc-by-nc-nd NeuroImage 2015-06-13

The present study was motivated by a theory, which proposes that speech includes articulatory gestures are connected to particular hand actions. We hypothesized certain would be more associated with the precision grip than power grip, and vice versa. In study, participants pronounced syllable performed simultaneously or theorized either congruent incongruent syllable. Relatively fast responses were in tip of tongue contacted alveolar ridge ([te]) aperture vocal tract remained small ([hi]),...

10.1371/journal.pone.0053061 article EN cc-by PLoS ONE 2013-01-09

Abstract In ‘quantity‐languages’, such as Japanese or Finnish, sound duration is linguistically relevant. We showed that quantity‐language speakers were superior to of a non‐quantity language in discriminating the even non‐speech sounds. contrast, there was no group difference discrimination frequency. This result, obtained both by behavioural and neural indices at attentive automatic levels processing, indicates precise feature‐specific tuning auditory‐cortex functions mother tongue.

10.1111/j.1460-9568.2006.04752.x article EN European Journal of Neuroscience 2006-05-01

This paper describes a source modeling method for hidden Markov model (HMM) based speech synthesis improved naturalness. A corpus is first decomposed into the glottal signal and of vocal tract filter using inverse filtering, parametrized excitation spectral features. Additionally, library pulses extracted from estimated voice signal. In stage, generated by selecting appropriate according to target cost features concatenation between adjacent pulses. Finally, synthesized filtering filter....

10.1109/icassp.2011.5947370 article EN 2011-05-01

Over the last century, researchers have collected a considerable amount of data reflecting properties Lombard speech, i.e., speech in noisy environment. The documented phenomena predominately report effects on signal produced ambient noise. In comparison, relatively little is known about underlying articulatory patterns particular for lingual articulation. Here authors present an analysis recordings material babble noise different intensity levels and hypoarticulated quantitative differences...

10.1121/1.4939495 article EN The Journal of the Acoustical Society of America 2016-01-01

Humans modify their voice in interfering noise order to maintain the intelligibility of speech – this is called Lombard effect. This ability, however, has not been extensively modeled synthesis. Here we compare several methods synthesizing using a physiologically based statistical synthesis system (GlottHMM). The results show that realistic street situation synthetic judged by listeners both as appropriate for and intelligible natural speech. Of different types models, one adaptation...

10.21437/interspeech.2011-696 article EN Interspeech 2022 2011-08-27

Quality and intelligibility of narrowband telephone speech can be improved by artificial bandwidth extension (ABE), which extends the using only information available in signal. This paper reports a three-language evaluation an ABE method that has recently been launched several Nokia's mobile models. The to frequencies above band first utilizing spectral folding then modifying magnitude spectrum with spline curves. performance was evaluated formal listening tests American English, Russian,...

10.1109/tasl.2008.925149 article EN IEEE Transactions on Audio Speech and Language Processing 2008-07-24
Coming Soon ...