Jaň Černocký

ORCID: 0000-0002-8800-0210
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Topic Modeling
  • Speech and dialogue systems
  • Advanced Data Compression Techniques
  • Neural Networks and Applications
  • Time Series Analysis and Forecasting
  • Video Analysis and Summarization
  • Advanced Adaptive Filtering Techniques
  • Advanced Text Analysis Techniques
  • Digital Media Forensic Detection
  • Phonetics and Phonology Research
  • Algorithms and Data Compression
  • Anomaly Detection Techniques and Applications
  • Image and Signal Denoising Methods
  • Dispute Resolution and Class Actions
  • AI in Service Interactions
  • Wireless Communication Networks Research
  • Handwritten Text Recognition Techniques
  • Bayesian Methods and Mixture Models
  • Wireless Signal Modulation Classification
  • Emotion and Mood Recognition
  • Web Data Mining and Analysis

Brno University of Technology
2016-2025

Edip (Czechia)
2022

UniLaSalle Amiens (ESIEE-Amiens)
2002

Université Gustave Eiffel
2002

A new recurrent neural network based language model (RNN LM) with applications to speech recognition is presented. Results indicate that it possible obtain around 50% reduction of perplexity by using mixture several RNN LMs, compared a state the art backoff model. Speech experiments show 18% word error rate on Wall Street Journal task when comparing models trained same amount data, and 5% much harder NIST RT05 task, even more data than LM. We provide ample empirical evidence suggest...

10.21437/interspeech.2010-343 article EN Interspeech 2022 2010-09-26

We present several modifications of the original recurrent neural network language model (RNN LM).While this has been shown to significantly outperform many competitive modeling techniques in terms accuracy, remaining problem is computational complexity. In work, we show approaches that lead more than 15 times speedup for both training and testing phases. Next, importance using a backpropagation through time algorithm. An empirical comparison with feedforward networks also provided. end,...

10.1109/icassp.2011.5947611 article EN 2011-05-01

We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the are sorted by their relevance. introduce hash-based implementation of a maximum entropy model, that can be trained as part model. This leads significant reduction computational complexity. achieved around 10% relative word error rate English Broadcast News speech recognition task, against 4-gram model 400M tokens.

10.1109/asru.2011.6163930 article EN 2011-12-01

In recent years, probabilistic features became an integral part of state-of-the-are LVCSR systems. this work, we are exploring the possibility obtaining directly from neural net without necessity converting output probabilities to suitable for subsequent GMM-HMM system. We experimented with 5-layer MLP bottle-neck in middle layer. After training such a net, used outputs as recognition The benefits twofold: first, improvement was gained when these instead features, second, size system...

10.1109/icassp.2007.367023 article EN 2007-04-01

We present results obtained with several advanced language modeling techniques, including class based model, cache maximum entropy structured random forest model and types of neural network models. show after combining all these models by using linear interpolation. conclude that for both small moderately sized tasks, we obtain new state the art combination models, is significantly better than performance any individual model. Obtained perplexity reductions against Good-Turing trigram...

10.21437/interspeech.2011-242 article EN Interspeech 2022 2011-08-27

The processing of speech corrupted by interfering overlapping speakers is one the challenging problems with regards to today's automatic recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most these tackle problem as separation, i.e., they blindly recover all from mixture. In some scenarios, such smart personal devices, we may however be interested in recovering target speaker a paper, introduce SpeakerBeam, method for...

10.1109/jstsp.2019.2922820 article EN IEEE Journal of Selected Topics in Signal Processing 2019-06-13

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers. This phenomenon is known as the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cocktail party effect</i> . For decades, researchers focused on approaching listening ability of humans. One critical issue handling speakers because nontarget speech signals share similar characteristics, complicating their...

10.1109/msp.2023.3240008 article EN IEEE Signal Processing Magazine 2023-05-01

This paper describes and discusses the "STBU" speaker recognition system, which performed well in NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), University Stellenbosch Africa). system was combination three main kinds subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features,...

10.1109/tasl.2007.902870 article EN IEEE Transactions on Audio Speech and Language Processing 2007-08-22

This paper deals with phoneme recognition based on neural networks (NN). First, several approaches to improve the error rate are suggested and discussed. In experimental part, we concentrate TempoRAl Patterns (TRAPs) novel split temporal context (STC) recognizers. We also investigate into tandem NN architectures. The results of final system reported standard TIMIT database compare favorably best published results.

10.1109/icassp.2006.1660023 article EN 2006-08-02

In this paper, we investigate alternative ways of processing MFCC-based features to use as the input Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing 13-dimensional front-end MFCCs across 9 frames, followed by applying LDA reduce dimension 40 and then further decorrelation using MLLT. Confirming results other groups, show speaker adaptation applied on top these feature-space MLLR helpful. The fact number parameters DNN not strongly sensitive...

10.21437/interspeech.2013-48 article EN Interspeech 2022 2013-08-25

In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. i-vectors are scored using a simple cosine distance advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) heavy-tailed variant PLDA (PLDA-HT). Finally, investigate into dimensionality reduction before entering the PLDA-HT modeling. results very competitive: on...

10.1109/icassp.2011.5947436 article EN 2011-05-01

<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> In this paper, several feature extraction and channel compensation techniques found in state-of-the-art speaker verification systems are analyzed discussed. For the NIST SRE 2006 submission, cepstral mean subtraction, warping, RelAtive SpecTrAl (RASTA) filtering, heteroscedastic linear discriminant analysis (HLDA), mapping, eigenchannel adaptation were incrementally added to minimize system's...

10.1109/tasl.2007.902499 article EN IEEE Transactions on Audio Speech and Language Processing 2007-08-22

This paper presents BUT ReverbDB - a dataset of real room impulse responses (RIR), background noises and re-transmitted speech data. The retransmitted data includes LibriSpeech test-clean, 2000 HUB5 English evaluation part 2010 NIST Speaker Recognition Evaluation datasets. We provide detailed description RIR collection (hardware, software, post-processing) that can serve as "cook-book" for similar efforts. also validate in two sets automatic recognition (ASR) experiments draw conclusions...

10.1109/jstsp.2019.2917582 article EN IEEE Journal of Selected Topics in Signal Processing 2019-05-17

Recently, several nonparametric Bayesian models have been proposed to automatically discover acoustic units in unlabeled data. Most of them are trained using various versions the Gibbs Sampling (GS) method. In this work, we consider Variational Bayes (VB) as alternative inference process. Even though VB yields an approximate solution posterior distribution it can be easily parallelized which makes more suitable for large database. Results show that, notwithstanding is order magnitude faster,...

10.1016/j.procs.2016.04.033 article EN Procedia Computer Science 2016-01-01

We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from segment. iVector low-dimensional fixed-length representing such information. To utilized adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount annotated data the relevant and compensate...

10.1109/asru.2011.6163922 article EN 2011-12-01

This work studies the usage of Deep Neural Network (DNN) Bottleneck (BN) features together with traditional MFCC in task i-vector-based speaker recognition. We decouple sufficient statistics extraction by using separate GMM models for frame alignment, and normalization we analyze BN (and their concatenation) two stages. also show effect full-covariance models, and, as a contrast, compare result to recent DNN-alignment approach. On NIST SRE2010, telephone condition, 60% relative gain over...

10.1109/icassp.2016.7472649 article EN 2016-03-01

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by Sequence Summarizing Neural Network (SSNN). Similarly to extractor, SSNN produces "summary vector", representing an acoustic summary of utterance. Such vector then appended input main network, while both networks are trained together optimizing single loss function. Both and speaker methods compared on AMI meeting data. The results show comparable performance techniques FBANK system with...

10.1109/icassp.2016.7472692 article EN 2016-03-01

Sequence-to-sequence automatic speech recognition (ASR) models require large quantities of data to attain high performance.For this reason, there has been a recent surge in interest for unsupervised and semi-supervised training such models.This work builds upon results showing notable improvements using cycle-consistency related techniques.Such techniques derive procedures losses able leverage unpaired and/or text by combining ASR with Text-to-Speech (TTS) models.In particular, proposes new...

10.21437/interspeech.2019-3167 article EN Interspeech 2022 2019-09-13

10.1109/icassp49660.2025.10887683 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12
Coming Soon ...