Konstantin Markov

ORCID: 0000-0003-1838-4789
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Music and Audio Processing
  • Natural Language Processing Techniques
  • Neural Networks and Applications
  • Gaussian Processes and Bayesian Inference
  • Speech and dialogue systems
  • Phonetics and Phonology Research
  • Time Series Analysis and Forecasting
  • Topic Modeling
  • Advanced Data Compression Techniques
  • Personality Traits and Psychology
  • Voice and Speech Disorders
  • Emotion and Mood Recognition
  • Sentiment Analysis and Opinion Mining
  • Hearing Loss and Rehabilitation
  • Neuroscience and Music Perception
  • Blind Source Separation Techniques
  • Advanced Text Analysis Techniques
  • Fault Detection and Control Systems
  • Mental Health via Writing
  • Bayesian Modeling and Causal Inference
  • Handwritten Text Recognition Techniques
  • Cognitive Science and Education Research
  • Software Engineering Research

University of Aizu
2013-2024

The Institute of Statistical Mathematics
2016

Data61
2009

National Institute of Information and Communications Technology
2007-2008

Advanced Telecommunications Research Institute International
2004-2008

KRI
2008

Language Science (South Korea)
2006

Toyohashi University of Technology
1996-2002

In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on between English and Asian languages (Japanese Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, text-to-speech synthesis. All them designed using state-of-the-art technologies developed at ATR. A corpus-based statistical learning framework forms basis system design. We use a...

10.1109/tsa.2005.860774 article EN IEEE Transactions on Audio Speech and Language Processing 2006-02-21

Gaussian Processes (GPs) are Bayesian nonparametric models that becoming more and popular for their superior capabilities to capture highly nonlinear data relationships in various tasks, such as dimensionality reduction, time series analysis, novelty detection, well classical regression classification tasks. In this paper, we investigate the feasibility applicability of GP music genre emotion estimation. These two main tasks information retrieval (MIR) field. So far, support vector machine...

10.1109/access.2014.2333095 article EN cc-by-nc-nd IEEE Access 2014-01-01

Many approaches have been proposed to automatically infer users personality from their social networks activities. However, the performance of these depends heavily on data representation. In this work, we apply deep learning methods learn suitable representation for recognition task. our experiments, used Facebook status updates data. We investigated several neural network architectures such as fully-connected (FC) networks, convolutional (CNN) and recurrent (RNN) myPersonality shared task...

10.1109/icawst.2017.8256484 article EN 2017-11-01

Music as a form of art is intentionally composed to be emotionally expressive. The emotional features music are invaluable for indexing and recommendation. In this paper we present cross-comparison automatic analysis music. We created public dataset Creative Commons licensed songs. Using valence arousal model, the songs were annotated both in terms emotions that expressed by whole excerpt dynamically with 1 Hz temporal resolution. Each song received 10 annotations on Amazon Mechanical Turk...

10.1145/2647868.2655019 article EN 2014-11-03

In this paper, we describe newhigh-performanceon-line speaker diarization system which works faster than real-time and has very low latency. It consists of several modules including voice activity detection, novel gender identity classification. Allmodules share a set Gaussian mixturemodels (GMM) representing pause, male female speakers, each individual speaker. Initially, there are only three GMMs for pause two genders, trained in advance from some data. During the process, speech segment...

10.1109/asru.2007.4430197 article EN 2007-01-01

Many studies have shown that articulatory features can significantly improve the performance of automatic speech recognition systems. Unfortunately, such are not available at time. There two main approaches to solve this problem: a feature-based approach, most popular example which is acoustic-to-articulatory inversion, where missing generated from signal, and model-based information embedded in model structure parameters way allows using only acoustic features. In paper, we propose new...

10.1109/taslp.2019.2894554 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2019-01-21

Detection of speakers which have not been seen before is an essential part every online speaker diarization system. New detection accuracy has direct impact on the overall performance. In our previous system, for novelty we used global GMM likelihood ratio (LR) threshold. However, as system analysis showed, optimal threshold depends gender well number registered speakers. this paper, present results and approach taken to solve problem. First, use different thresholds male female speakers,...

10.21437/interspeech.2008-149 article EN Interspeech 2022 2008-09-22

Despite the progress of deep neural networks over last decade, state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance. Methods to improve noise robustness usually include adding components recognition system that often need optimization. For this reason, data augmentation input features derived Short-Time Fourier Transform (STFT) has become a popular approach. However, for many processing tasks, there is an evidence...

10.3390/electronics9071157 article EN Electronics 2020-07-17

Availability of large amounts raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and come from same distribution. This restriction removed self-taught approach where can be different, but nevertheless have similar structure. First, a representation learned via sparse coding then applied to used for classification. this work, we implemented method music genre classification task using two different...

10.1109/icassp.2012.6288282 article EN 2012-03-01

In this paper, we describe a method for phoneme set selection based on combination of phonological and statistical information its application Russian speech recognition. For language, currently used sets are mostly rule-based or heuristically derived from the standard SAMPA IPA phonetic alphabets. However, some other languages, methods have been found useful optimization. almost all phonemes come in pairs: consonants can be hard soft vowels stressed unstressed. First, start with big then...

10.1109/nlpke.2011.6138246 article EN 2011-11-01

Availability of large amounts raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and come from same distribution. This restriction removed self-taught algorithm where can be different, but nevertheless have similar structure. First, a representation learned samples by decomposing their matrix into two matrices called bases activations respectively. procedure justified assumption each sample linear...

10.1186/1687-4722-2013-6 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2013-04-09

In the speaker recognition, when cepstral coefficients are calculated from LPC analysis parameters, prediction error, or residual signal, is usually ignored. However, there an evidence that it contains a specific information. The fundamental frequency of speech signal pitch, which extracted residual, has been used for recognition purposes, but because high intraspeaker variability pitch also often This paper describes our approach to integrating and LPC-residual with LPC-cepstrum in Gaussian...

10.1250/ast.20.281 article EN Journal of the Acoustical Society of Japan (E) 1999-01-01

In this paper, we present a review of the latest developments in Russian speech recognition research. Although underlying technology is mostly language-independent, differences between languages with respect to their structure and grammar have substantial effect on systems performance. The language has complicated word formation system, which characterized by high degree inflection unrigidness order. This greatly reduces predictive power conventional models consequently increases error rate....

10.1145/2160749.2160763 article EN 2012-03-08

Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human is continuous and time-varying. In this paper, we present a system for dynamic based state-space models (SSMs). The prediction unknown trajectory in space spanned by Arousal, Valence, Dominance (A-V-D) descriptors cast as time series filtering task. state investigated include standard linear model (Kalman filter) well novel non-linear, non-parametric...

10.1109/eusipco.2015.7362750 preprint EN 2015-08-01

It is difficult to recognize speech distorted by various factors, especially when an ASR system contains only a single acoustic model. One solution use multiple models, one model for each different condition. In this paper, we discuss parallel decoding-based that robust the noise type, SNR, speaker gender and speaking style. Our consists of two recognition channels based on MFCC Differential (DMFCC) features. Each channel has several models depending style, adapted fast adaptation. From...

10.21437/interspeech.2004-726 article EN Interspeech 2022 2004-10-04

In this paper we introduce Gaussian Process (GP) models for music genre classification. Processes are widely used various regression and classification tasks, but there relatively few studies where GPs applied in the audio signal processing systems. The GP non-parametric discriminative classifiers similar to well known SVMs terms of usage. contrast SVMs, however, produce truly probabilistic output allow kernel function parameters be learned from training data. work compare performance as...

10.1109/mlsp.2013.6661991 article EN 2013-09-01
Coming Soon ...