NFDI4DS | UHH-SEMS - Publication Details

Tuomas Virtanen

ORCID: 0000-0002-4604-9729

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5049691461

Research Areas

Speech and Audio Processing
Music and Audio Processing
Speech Recognition and Synthesis
Blind Source Separation Techniques
Music Technology and Sound Studies
Allergic Rhinitis and Sensitization
Advanced Adaptive Filtering Techniques
Diverse Musicological Studies
Food Allergy and Anaphylaxis Research
Asthma and respiratory diseases
Underwater Acoustics Research
Contact Dermatitis and Allergies
Hearing Loss and Rehabilitation
Acoustic Wave Phenomena Research
Video Analysis and Summarization
Animal Vocal Communication and Behavior
Noise Effects and Management
Occupational exposure and asthma
Natural Language Processing Techniques
Geophysical Methods and Applications
Arctic and Antarctic ice dynamics
Advanced Data Compression Techniques
Structural Health Monitoring Techniques
Monoclonal and Polyclonal Antibodies Research
Anomaly Detection Techniques and Applications

Tampere University
2016-2025

Nokia (Finland)
2025

University of Surrey
2023

University of Eastern Finland
2010-2022

Signal Processing (United States)
2015-2021

Institute of Electrical and Electronics Engineers
2021

Tampere University of Applied Sciences
2007-2018

Tampere University
2008-2018

Shenyang Institute of Automation
2016

Chinese Academy of Sciences
2016

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

OPENALEX - Publications

Tuomas Virtanen

An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The based on factorizing magnitude spectrogram an input signal into a sum components, each which has fixed spectrum and time-varying gain. Each source, turn, modeled as one or more components. parameters components are estimated by minimizing reconstruction error between model, while restricting component spectrograms to be nonnegative favoring whose gains slowly varying sparse....

10.1109/tasl.2006.885253 article EN IEEE Transactions on Audio Speech and Language Processing 2007-03-01

Deep Learning for Audio Signal Processing

OPENALEX - Publications

H.‐G. Purwins Bo Li Tuomas Virtanen Jan Schlüter Shuo-Yiin Chang and 1 more

Given the recent surge in developments of deep learning, this paper provides a review state-of-the-art learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, order to point out similarities differences between domains, highlighting general methods, problems, key references, potential cross fertilization areas. The dominant feature representations (in particular, log-mel spectra raw waveform) models reviewed, including...

10.1109/jstsp.2019.2908700 article EN IEEE Journal of Selected Topics in Signal Processing 2019-04-01

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

OPENALEX - Publications

Emre Çakır Giambattista Parascandolo Toni Heittola Heikki Huttunen Tuomas Virtanen

Sound events often occur in unstructured environments where they exhibit wide variations their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that invariant local spectral variations. Recurrent (RNNs) powerful learning the longer term context audio signals. CNNs RNNs as classifiers have recently shown improved performances over established methods various sound recognition tasks. We combine these two approaches a Neural...

10.1109/taslp.2017.2690575 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-05-23

Metrics for Polyphonic Sound Event Detection

OPENALEX - Publications

Annamaria Mesaros Toni Heittola Tuomas Virtanen

This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sources active simultaneously. The system output this case contains overlapping events, marked as sounds detected being at the same time. requires a suitable procedure against reference. Metrics from neighboring fields such speech recognition speaker diarization can be used, but they need to partially redefined...

10.3390/app6060162 article EN cc-by Applied Sciences 2016-05-25

TUT database for acoustic scene classification and sound event detection

OPENALEX - Publications

Annamaria Mesaros Toni Heittola Tuomas Virtanen

We introduce TUT Acoustic Scenes 2016 database for environmental sound research, consisting of binaural recordings from 15 different acoustic environments. A subset this database, called Sound Events 2016, contains annotations individual events, specifically created event detection. consists residential area and home environments, is manually annotated to mark onset, offset label events. In paper we present the recording annotation procedure, content, a recommended cross-validation setup...

10.1109/eusipco.2016.7760424 article EN 2021 29th European Signal Processing Conference (EUSIPCO) 2016-08-01

Microarrayed allergen molecules: diagnostic gatekeepers for allergy treatment

OPENALEX - Publications

Reinhard Hiller Sylvia Laffer Christian Harwanegg Martin Huber Wolfgang M. Schmidt and 32 more

Type I allergy is an immunoglobulin E (IgE)-mediated hypersensitivity disease affecting more than 25% of the population. Currently, diagnosis performed by provocation testing and IgE serology using allergen extracts. This process defines allergen-containing sources but cannot identify disease-eliciting allergenic molecules. We have applied microarray technology to develop a miniaturized test containing 94 purified molecules that represent most common sources. The allows determination...

10.1096/fj.01-0711fje article EN The FASEB Journal 2002-01-14

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

OPENALEX - Publications

Sharath Adavanne Archontis Politis Joonas Nikunen Tuomas Virtanen

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping events in three-dimensional (3D) space. The proposed takes sequence consecutive spectrogram time-frames as input maps it to two outputs parallel. As the first output, (SED) is performed multi-label classification task on each time-frame producing temporal activity all classes. second by estimating 3D Cartesian coordinates direction-of-arrival...

10.1109/jstsp.2018.2885636 article EN IEEE Journal of Selected Topics in Signal Processing 2018-12-07

Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

OPENALEX - Publications

Jort F. Gemmeke Tuomas Virtanen Antti Hurmalainen

This paper proposes to use exemplar-based sparse representations for noise robust automatic speech recognition. First, we describe how can be modeled as a linear combination of small number exemplars from large exemplar dictionary. The are time-frequency patches real speech, each spanning multiple time frames. We then propose model corrupted by additive and exemplars, derive an algorithm recovering this the observed noisy speech. framework used doing hybrid exemplar-based/HMM recognition...

10.1109/tasl.2011.2112350 article EN IEEE Transactions on Audio Speech and Language Processing 2011-02-09

Recurrent neural networks for polyphonic sound event detection in real life recordings

OPENALEX - Publications

Giambattista Parascandolo Heikki Huttunen Tuomas Virtanen

In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained map acoustic features of a mixture signal consisting sounds from multiple classes, binary activity indicators each class. Our method tested large database real-life recordings, with 61 classes (e.g. music, car, speech) 10 different everyday contexts. The proposed...

10.1109/icassp.2016.7472917 preprint EN 2016-03-01

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

OPENALEX - Publications

Annamaria Mesaros Toni Heittola Emmanouil Benetos Peter Foster Mathieu Lagrange and 2 more

Public evaluation campaigns and datasets promote active development in target research areas, allowing direct comparison of algorithms. The second edition the challenge on detection classification acoustic scenes events (DCASE 2016) has offered such an opportunity for state-of-the-art methods, succeeded drawing together a large number participants from academic industrial backgrounds. In this paper, we report tasks outcomes DCASE 2016 challenge. comprised four tasks: scene classification,...

10.1109/taslp.2017.2778423 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2017-11-30

Polyphonic sound event detection using multi label deep neural networks

OPENALEX - Publications

Emre Çakır Toni Heittola Heikki Huttunen Tuomas Virtanen

In this paper, the use of multi label neural networks are proposed for detection temporally overlapping sound events in realistic environments. Real-life recordings typically have many events, making it hard to recognize each event with standard methods. Frame-wise spectral-domain features used as inputs train a deep network classification work. The model is evaluated from everyday environments and obtained overall accuracy 63.8%. method compared against state-of-the-art using non-negative...

10.1109/ijcnn.2015.7280624 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2015-07-01

Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network

OPENALEX - Publications

Sharath Adavanne Archontis Politis Tuomas Virtanen

This paper proposes a deep neural network for estimating the directions of arrival (DOA) multiple sound sources. The proposed stacked convolutional and recurrent (DOAnet) generates spatial pseudo-spectrum (SPS) along with DOA estimates in both azimuth elevation. We avoid any explicit feature extraction step by using magnitudes phases spectrograms all channels as input to network. DOAnet is evaluated DOAs concurrently present sources anechoic, matched unmatched reverberant conditions. results...

10.23919/eusipco.2018.8553182 article EN 2021 29th European Signal Processing Conference (EUSIPCO) 2018-09-01

Context-dependent sound event detection

OPENALEX - Publications

Toni Heittola Annamaria Mesaros Antti Eronen Tuomas Virtanen

The work presented in this article studies how the context information can be used automatic sound event detection process, and system benefit from such information. Humans are using to make more accurate predictions about events ruling out unlikely given context. We propose a similar utilization of process. proposed approach is composed two stages: recognition stage stage. Contexts modeled Gaussian mixture models three-state left-to-right hidden Markov models. In first stage, audio tested...

10.1186/1687-4722-2013-1 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2013-01-09

A multi-device dataset for urban acoustic scene classification

OPENALEX - Publications

Annamaria Mesaros Toni Heittola Tuomas Virtanen

This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and TUT Urban Acoustic Scenes dataset provided for task, evaluates performance a baseline system in task. As previous years challenge, is defined short audio samples into one predefined classes, using supervised, closed-set setup. The newly recorded consists ten different scenes was six large European cities, therefore it has higher variability than datasets used this addition to high-quality binaural...

10.48550/arxiv.1807.09840 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Clotho: an Audio Captioning Dataset

OPENALEX - Publications

Konstantinos Drossos Samuel Lipping Tuomas Virtanen

Audio captioning is the novel task of general audio content description using free text. It an intermodal translation (not speech-to-text), where a system accepts as input signal and outputs textual (i.e. caption) that signal. In this paper we present Clotho, dataset for consisting 4981 samples 15 to 30 seconds duration 24 905 captions eight 20 words length, baseline method provide initial results. Clotho built with focus on caption diversity, splits data are not hampering training or...

10.1109/icassp40776.2020.9052990 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020-04-09

Development of antioxidant activity in milk whey during fermentation with lactic acid bacteria

OPENALEX - Publications

Tuomas Virtanen Anne Pihlanto Sari Mäkinen Hannu Korhonen

To investigate the production of antioxidant activity during fermentation with commonly used dairy starter cultures. Moreover, to study development fermentation, and connection proteolysis bacterial growth.Antioxidant was measured by analysing radical scavenging using a spectrophotometric decolorization assay lipid peroxidation inhibition assayed liposomal model system fluorescence method. Milk fermented 25 lactic acid (LAB) strains, from these six exhibiting highest selected for further...

10.1111/j.1365-2672.2006.03072.x article EN Journal of Applied Microbiology 2006-08-01

Voice Conversion Using Partial Least Squares Regression

OPENALEX - Publications

Elina Helander Tuomas Virtanen Jani Nurminen Moncef Gabbouj

Voice conversion can be formulated as finding a mapping function which transforms the features of source speaker to those target speaker. Gaussian mixture model (GMM)-based is commonly used, but it subject overfitting. In this paper, we propose use partial least squares (PLS)-based in voice conversion. To prevent overfitting, degrees freedom controlled by choosing suitable number components. We technique combine PLS with GMMs, enabling multiple local linear mappings. further improve...

10.1109/tasl.2010.2041699 article EN IEEE Transactions on Audio Speech and Language Processing 2010-04-09

Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion

OPENALEX - Publications

Zhizheng Wu Tuomas Virtanen Eng Siong Chng Haizhou Li

We propose a nonparametric framework for voice conversion, that is, exemplar-based sparse representation with residual compensation. In this framework, spectrogram is reconstructed as weighted linear combination of speech segments, called exemplars, which span multiple consecutive frames. The weights are constrained to be avoid over-smoothing, and high-resolution spectra employed in the exemplars directly without dimensionality reduction maintain spectral details. addition, compression...

10.1109/taslp.2014.2333242 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2014-06-25

Sound event detection using spatial features and convolutional recurrent neural network

OPENALEX - Publications

Sharath Adavanne Pasi Pertilä Tuomas Virtanen

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network handle more than one type of these by learning each them separately in initial stages. show that instead concatenating channel into a single feature vector learns events better when they are presented as separate layers volume. Using proposed over monaural on same gives an absolute F-score improvement 6.1% publicly available...

10.1109/icassp.2017.7952260 article EN 2017-03-01

Voice Conversion Using Dynamic Kernel Partial Least Squares Regression

OPENALEX - Publications

Elina Helander Hanna Silén Tuomas Virtanen Moncef Gabbouj

A drawback of many voice conversion algorithms is that they rely on linear models and/or require a lot tuning. In addition, them ignore the inherent time-dependency between speech features. To address these issues, we propose to use dynamic kernel partial least squares (DKPLS) technique model nonlinearities as well capture dynamics in data. The method based transformation source features allow non-linear modeling and concatenation previous next frames dynamics. Partial regression used find...

10.1109/tasl.2011.2165944 article EN IEEE Transactions on Audio Speech and Language Processing 2011-08-25

Coming Soon ...