NFDI4DS | UHH-SEMS - Publication Details

Jaň Černocký

ORCID: 0000-0002-8800-0210

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5045539248

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Topic Modeling
Speech and dialogue systems
Advanced Data Compression Techniques
Neural Networks and Applications
Time Series Analysis and Forecasting
Video Analysis and Summarization
Advanced Adaptive Filtering Techniques
Advanced Text Analysis Techniques
Digital Media Forensic Detection
Phonetics and Phonology Research
Algorithms and Data Compression
Anomaly Detection Techniques and Applications
Image and Signal Denoising Methods
Dispute Resolution and Class Actions
AI in Service Interactions
Wireless Communication Networks Research
Handwritten Text Recognition Techniques
Bayesian Methods and Mixture Models
Wireless Signal Modulation Classification
Emotion and Mood Recognition
Web Data Mining and Analysis

Brno University of Technology
2016-2025

Edip (Czechia)
2022

UniLaSalle Amiens (ESIEE-Amiens)
2002

Université Gustave Eiffel
2002

Recurrent neural network based language model

OPENALEX - Publications

Tomáš Mikolov Martin Karafiát Lukáš Burget Jaň Černocký Sanjeev Khudanpur

A new recurrent neural network based language model (RNN LM) with applications to speech recognition is presented. Results indicate that it possible obtain around 50% reduction of perplexity by using mixture several RNN LMs, compared a state the art backoff model. Speech experiments show 18% word error rate on Wall Street Journal task when comparing models trained same amount data, and 5% much harder NIST RT05 task, even more data than LM. We provide ample empirical evidence suggest...

10.21437/interspeech.2010-343 article EN Interspeech 2022 2010-09-26

Extensions of recurrent neural network language model

OPENALEX - Publications

Tomáš Mikolov Stefan Kombrink Lukáš Burget Jaň Černocký Sanjeev Khudanpur

We present several modifications of the original recurrent neural network language model (RNN LM).While this has been shown to significantly outperform many competitive modeling techniques in terms accuracy, remaining problem is computational complexity. In work, we show approaches that lead more than 15 times speedup for both training and testing phases. Next, importance using a backpropagation through time algorithm. An empirical comparison with feedforward networks also provided. end,...

10.1109/icassp.2011.5947611 article EN 2011-05-01

Strategies for training large scale neural network language models

OPENALEX - Publications

Tomáš Mikolov Anoop Deoras Daniel Povey Lukáš Burget Jaň Černocký

We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the are sorted by their relevance. introduce hash-based implementation of a maximum entropy model, that can be trained as part model. This leads significant reduction computational complexity. achieved around 10% relative word error rate English Broadcast News speech recognition task, against 4-gram model 400M tokens.

10.1109/asru.2011.6163930 article EN 2011-12-01

Probabilistic and Bottle-Neck Features for LVCSR of Meetings

OPENALEX - Publications

František Grézl Martin Karafiát Stanislav Kontar Jaň Černocký

In recent years, probabilistic features became an integral part of state-of-the-are LVCSR systems. this work, we are exploring the possibility obtaining directly from neural net without necessity converting output probabilities to suitable for subsequent GMM-HMM system. We experimented with 5-layer MLP bottle-neck in middle layer. After training such a net, used outputs as recognition The benefits twofold: first, improvement was gained when these instead features, second, size system...

10.1109/icassp.2007.367023 article EN 2007-04-01

Empirical evaluation and combination of advanced language modeling techniques

OPENALEX - Publications

Tomáš Mikolov Anoop Deoras Stefan Kombrink Lukáš Burget Jaň Černocký

We present results obtained with several advanced language modeling techniques, including class based model, cache maximum entropy structured random forest model and types of neural network models. show after combining all these models by using linear interpolation. conclude that for both small moderately sized tasks, we obtain new state the art combination models, is significantly better than performance any individual model. Obtained perplexity reductions against Good-Turing trigram...

10.21437/interspeech.2011-242 article EN Interspeech 2022 2011-08-27

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

OPENALEX - Publications

Kateřina Žmolíková Marc Delcroix Keisuke Kinoshita Tsubasa Ochiai Tomohiro Nakatani and 2 more

The processing of speech corrupted by interfering overlapping speakers is one the challenging problems with regards to today's automatic recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most these tackle problem as separation, i.e., they blindly recover all from mixture. In some scenarios, such smart personal devices, we may however be interested in recovering target speaker a paper, introduce SpeakerBeam, method for...

10.1109/jstsp.2019.2922820 article EN IEEE Journal of Selected Topics in Signal Processing 2019-06-13

Neural Target Speech Extraction: An overview

OPENALEX - Publications

Kateřina Žmolíková Marc Delcroix Tsubasa Ochiai Keisuke Kinoshita Jaň Černocký and 1 more

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers. This phenomenon is known as the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cocktail party effect</i> . For decades, researchers focused on approaching listening ability of humans. One critical issue handling speakers because nontarget speech signals share similar characteristics, complicating their...

10.1109/msp.2023.3240008 article EN IEEE Signal Processing Magazine 2023-05-01

Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

OPENALEX - Publications

Niko Brümmer Lukáš Burget Jaň Černocký Ondřej Glembek František Grézl and 5 more

This paper describes and discusses the "STBU" speaker recognition system, which performed well in NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), University Stellenbosch Africa). system was combination three main kinds subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features,...

10.1109/tasl.2007.902870 article EN IEEE Transactions on Audio Speech and Language Processing 2007-08-22

Hierarchical Structures of Neural Networks for Phoneme Recognition

OPENALEX - Publications

Petr Schwarz Pavel Matějka Jaň Černocký

This paper deals with phoneme recognition based on neural networks (NN). First, several approaches to improve the error rate are suggested and discussed. In experimental part, we concentrate TempoRAl Patterns (TRAPs) novel split temporal context (STC) recognizers. We also investigate into tandem NN architectures. The results of final system reported standard TIMIT database compare favorably best published results.

10.1109/icassp.2006.1660023 article EN 2006-08-02

Improved feature processing for deep neural networks

OPENALEX - Publications

Shakti P. Rath Daniel Povey Karel Veselý Jaň Černocký

In this paper, we investigate alternative ways of processing MFCC-based features to use as the input Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing 13-dimensional front-end MFCCs across 9 frames, followed by applying LDA reduce dimension 40 and then further decorrelation using MLLT. Confirming results other groups, show speaker adaptation applied on top these feature-space MLLR helpful. The fact number parameters DNN not strongly sensitive...

10.21437/interspeech.2013-48 article EN Interspeech 2022 2013-08-25

Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification

OPENALEX - Publications

Pavel Matějka Ondřej Glembek Fabio Castaldo Md. Jahangir Alam Oldřich Plchot and 3 more

In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. i-vectors are scored using a simple cosine distance advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) heavy-tailed variant PLDA (PLDA-HT). Finally, investigate into dimensionality reduction before entering the PLDA-HT modeling. results very competitive: on...

10.1109/icassp.2011.5947436 article EN 2011-05-01

Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System

OPENALEX - Publications

Lukáš Burget Pavel Matějka Petr Schwarz Ondřej Glembek Jaň Černocký

<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> In this paper, several feature extraction and channel compensation techniques found in state-of-the-art speaker verification systems are analyzed discussed. For the NIST SRE 2006 submission, cepstral mean subtraction, warping, RelAtive SpecTrAl (RASTA) filtering, heteroscedastic linear discriminant analysis (HLDA), mapping, eigenchannel adaptation were incrementally added to minimize system's...

10.1109/tasl.2007.902499 article EN IEEE Transactions on Audio Speech and Language Processing 2007-08-22

Building and evaluation of a real room impulse response dataset

OPENALEX - Publications

Igor Szöke Miroslav Skácel Ladislav Mošner Jakub Paliesek Jaň Černocký

This paper presents BUT ReverbDB - a dataset of real room impulse responses (RIR), background noises and re-transmitted speech data. The retransmitted data includes LibriSpeech test-clean, 2000 HUB5 English evaluation part 2010 NIST Speaker Recognition Evaluation datasets. We provide detailed description RIR collection (hardware, software, post-processing) that can serve as "cook-book" for similar efforts. also validate in two sets automatic recognition (ASR) experiments draw conclusions...

10.1109/jstsp.2019.2917582 article EN IEEE Journal of Selected Topics in Signal Processing 2019-05-17

Variational Inference for Acoustic Unit Discovery

OPENALEX - Publications

Lucas Ondel Lukáš Burget Jaň Černocký

Recently, several nonparametric Bayesian models have been proposed to automatically discover acoustic units in unlabeled data. Most of them are trained using various versions the Gibbs Sampling (GS) method. In this work, we consider Variational Bayes (VB) as alternative inference process. Even though VB yields an approximate solution posterior distribution it can be easily parallelized which makes more suitable for large database. Results show that, notwithstanding is order magnitude faster,...

10.1016/j.procs.2016.04.033 article EN Procedia Computer Science 2016-01-01

Analysis of Score Normalization in Multilingual Speaker Recognition

OPENALEX - Publications

Pavel Matějka Ondřej Novotný Oldřich Plchot Lukáš Burget Mireia Díez and 1 more

10.21437/interspeech.2017-803 article EN Interspeech 2022 2017-08-16

Comparison of keyword spotting approaches for informal continuous speech

OPENALEX - Publications

Igor Szöke Petr Schwarz Pavel Matějka Lukáš Burget Martin Karafiát and 2 more

10.21437/interspeech.2005-69 article EN Interspeech 2022 2005-09-04

Phonotactic language identification using high quality phoneme recognition

OPENALEX - Publications

Pavel Matějka Petr Schwarz Jaň Černocký Pavel Chytil

10.21437/interspeech.2005-708 article EN Interspeech 2022 2005-09-04

iVector-based discriminative adaptation for automatic speech recognition

OPENALEX - Publications

Martin Karafiát Lukáš Burget Pavel Matějka Ondřej Glembek Jaň Černocký

We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from segment. iVector low-dimensional fixed-length representing such information. To utilized adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount annotated data the relevant and compensate...

10.1109/asru.2011.6163922 article EN 2011-12-01

Analysis of DNN approaches to speaker identification

OPENALEX - Publications

Pavel Matějka Ondřej Glembek Ondřej Novotný Oldřich Plchot František Grézl and 2 more

This work studies the usage of Deep Neural Network (DNN) Bottleneck (BN) features together with traditional MFCC in task i-vector-based speaker recognition. We decouple sufficient statistics extraction by using separate GMM models for frame alignment, and normalization we analyze BN (and their concatenation) two stages. also show effect full-covariance models, and, as a contrast, compare result to recent DNN-alignment approach. On NIST SRE2010, telephone condition, 60% relative gain over...

10.1109/icassp.2016.7472649 article EN 2016-03-01

Bayesian HMM Based x-Vector Clustering for Speaker Diarization

OPENALEX - Publications

Mireia Díez Lukáš Burget Shuai Wang Johan Rohdin Jaň Černocký

10.21437/interspeech.2019-2813 article EN Interspeech 2022 2019-09-13

Sequence summarizing neural network for speaker adaptation

OPENALEX - Publications

Karel Veselý Shinji Watanabe Kateřina Žmolíková Martin Karafiát Lukáš Burget and 1 more

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by Sequence Summarizing Neural Network (SSNN). Similarly to extractor, SSNN produces "summary vector", representing an acoustic summary of utterance. Such vector then appended input main network, while both networks are trained together optimizing single loss function. Both and speaker methods compared on AMI meeting data. The results show comparable performance techniques FBANK system with...

10.1109/icassp.2016.7472692 article EN 2016-03-01

Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text

OPENALEX - Publications

Murali Karthick Baskar Shinji Watanabe Ramón Fernández Astudillo Takaaki Hori Lukáš Burget and 1 more

Sequence-to-sequence automatic speech recognition (ASR) models require large quantities of data to attain high performance.For this reason, there has been a recent surge in interest for unsupervised and semi-supervised training such models.This work builds upon results showing notable improvements using cycle-consistency related techniques.Such techniques derive procedures losses able leverage unpaired and/or text by combining ASR with Text-to-Speech (TTS) models.In particular, proposes new...

10.21437/interspeech.2019-3167 article EN Interspeech 2022 2019-09-13

Target Speaker ASR with Whisper

OPENALEX - Publications

Alexander Polok Dominik Klement Matthew Wiesner Sanjeev Khudanpur Jaň Černocký and 1 more

10.1109/icassp49660.2025.10887683 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Coming Soon ...