NFDI4DS | UHH-SEMS - Publication Details

Konstantin Markov

ORCID: 0000-0003-1838-4789

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5039443541

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Music and Audio Processing
Natural Language Processing Techniques
Neural Networks and Applications
Gaussian Processes and Bayesian Inference
Speech and dialogue systems
Phonetics and Phonology Research
Time Series Analysis and Forecasting
Topic Modeling
Advanced Data Compression Techniques
Personality Traits and Psychology
Voice and Speech Disorders
Emotion and Mood Recognition
Sentiment Analysis and Opinion Mining
Hearing Loss and Rehabilitation
Neuroscience and Music Perception
Blind Source Separation Techniques
Advanced Text Analysis Techniques
Fault Detection and Control Systems
Mental Health via Writing
Bayesian Modeling and Causal Inference
Handwritten Text Recognition Techniques
Cognitive Science and Education Research
Software Engineering Research

University of Aizu
2013-2024

The Institute of Statistical Mathematics
2016

Data61
2009

National Institute of Information and Communications Technology
2007-2008

Advanced Telecommunications Research Institute International
2004-2008

KRI
2008

Language Science (South Korea)
2006

Toyohashi University of Technology
1996-2002

The ATR Multilingual Speech-to-Speech Translation System

OPENALEX - Publications

Satoshi Nakamura Konstantin Markov Hiromi Nakaiwa Genichiro Kikui Hisashi Kawai and 5 more

In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on between English and Asian languages (Japanese Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, text-to-speech synthesis. All them designed using state-of-the-art technologies developed at ATR. A corpus-based statistical learning framework forms basis system design. We use a...

10.1109/tsa.2005.860774 article EN IEEE Transactions on Audio Speech and Language Processing 2006-02-21

Music Genre and Emotion Recognition Using Gaussian Processes

OPENALEX - Publications

Konstantin Markov Tomoko Matsui

Gaussian Processes (GPs) are Bayesian nonparametric models that becoming more and popular for their superior capabilities to capture highly nonlinear data relationships in various tasks, such as dimensionality reduction, time series analysis, novelty detection, well classical regression classification tasks. In this paper, we investigate the feasibility applicability of GP music genre emotion estimation. These two main tasks information retrieval (MIR) field. So far, support vector machine...

10.1109/access.2014.2333095 article EN cc-by-nc-nd IEEE Access 2014-01-01

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

OPENALEX - Publications

Alexey Karpov Konstantin Markov Irina Kipyatkova Daria Vazhenina Andrey Ronzhin

10.1016/j.specom.2013.07.004 article EN Speech Communication 2013-07-24

Robust Speech Recognition Using Generalized Distillation Framework

OPENALEX - Publications

Konstantin Markov Tomoko Matsui

10.21437/interspeech.2016-852 article EN Interspeech 2022 2016-08-29

Deep learning based personality recognition from Facebook status updates

OPENALEX - Publications

Jianguo Yu Konstantin Markov

Many approaches have been proposed to automatically infer users personality from their social networks activities. However, the performance of these depends heavily on data representation. In this work, we apply deep learning methods learn suitable representation for recognition task. our experiments, used Facebook status updates data. We investigated several neural network architectures such as fully-connected (FC) networks, convolutional (CNN) and recurrent (RNN) myPersonality shared task...

10.1109/icawst.2017.8256484 article EN 2017-11-01

Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework

OPENALEX - Publications

Konstantin Markov Jianwu Dang Satoshi Nakamura

10.1016/j.specom.2005.07.003 article EN Speech Communication 2005-08-16

Emotional Analysis of Music

OPENALEX - Publications

Mohammad Soleymani Anna Aljanaki Yi‐Hsuan Yang N. Michael Florian Eyben and 5 more

Music as a form of art is intentionally composed to be emotionally expressive. The emotional features music are invaluable for indexing and recommendation. In this paper we present cross-comparison automatic analysis music. We created public dataset Creative Commons licensed songs. Using valence arousal model, the songs were annotated both in terms emotions that expressed by whole excerpt dynamically with 1 Hz temporal resolution. Each song received 10 annotations on Amazon Mechanical Turk...

10.1145/2647868.2655019 article EN 2014-11-03

Never-ending learning system for on-line speaker diarization

OPENALEX - Publications

Konstantin Markov Satoshi Nakamura

In this paper, we describe newhigh-performanceon-line speaker diarization system which works faster than real-time and has very low latency. It consists of several modules including voice activity detection, novel gender identity classification. Allmodules share a set Gaussian mixturemodels (GMM) representing pause, male female speakers, each individual speaker. Initially, there are only three GMMs for pause two genders, trained in advance from some data. During the process, speech segment...

10.1109/asru.2007.4430197 article EN 2007-01-01

Articulatory and Spectrum Information Fusion Based on Deep Recurrent Neural Networks

OPENALEX - Publications

Jianguo Yu Konstantin Markov Tomoko Matsui

Many studies have shown that articulatory features can significantly improve the performance of automatic speech recognition systems. Unfortunately, such are not available at time. There two main approaches to solve this problem: a feature-based approach, most popular example which is acoustic-to-articulatory inversion, where missing generated from signal, and model-based information embedded in model structure parameters way allows using only acoustic features. In paper, we propose new...

10.1109/taslp.2019.2894554 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2019-01-21

Text-independent speaker recognition using non-linear frame likelihood transformation

OPENALEX - Publications

Konstantin Markov Seiichi Nakagawa

10.1016/s0167-6393(98)00010-7 article FR Speech Communication 1998-06-01

Improved novelty detection for online GMM based speaker diarization

OPENALEX - Publications

Konstantin Markov Satoshi Nakamura

Detection of speakers which have not been seen before is an essential part every online speaker diarization system. New detection accuracy has direct impact on the overall performance. In our previous system, for novelty we used global GMM likelihood ratio (LR) threshold. However, as system analysis showed, optimal threshold depends gender well number registered speakers. this paper, present results and approach taken to solve problem. First, use different thresholds male female speakers,...

10.21437/interspeech.2008-149 article EN Interspeech 2022 2008-09-22

End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

OPENALEX - Publications

Daria Vazhenina Konstantin Markov

Despite the progress of deep neural networks over last decade, state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance. Methods to improve noise robustness usually include adding components recognition system that often need optimization. For this reason, data augmentation input features derived Short-Time Fourier Transform (STFT) has become a popular approach. However, for many processing tasks, there is an evidence...

10.3390/electronics9071157 article EN Electronics 2020-07-17

Music genre classification using self-taught learning via sparse coding

OPENALEX - Publications

Konstantin Markov Tomoko Matsui

Availability of large amounts raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and come from same distribution. This restriction removed self-taught approach where can be different, but nevertheless have similar structure. First, a representation learned via sparse coding then applied to used for classification. this work, we implemented method music genre classification task using two different...

10.1109/icassp.2012.6288282 article EN 2012-03-01

Phoneme set selection for russian speech recognition

OPENALEX - Publications

Daria Vazhenina Konstantin Markov

In this paper, we describe a method for phoneme set selection based on combination of phonological and statistical information its application Russian speech recognition. For language, currently used sets are mostly rule-based or heuristically derived from the standard SAMPA IPA phonetic alphabets. However, some other languages, methods have been found useful optimization. almost all phonemes come in pairs: consonants can be hard soft vowels stressed unstressed. First, start with big then...

10.1109/nlpke.2011.6138246 article EN 2011-11-01

High level feature extraction for the self-taught learning algorithm

OPENALEX - Publications

Konstantin Markov Tomoko Matsui

Availability of large amounts raw unlabeled data has sparked the recent surge in semi-supervised learning research. In most works, however, it is assumed that labeled and come from same distribution. This restriction removed self-taught algorithm where can be different, but nevertheless have similar structure. First, a representation learned samples by decomposing their matrix into two matrices called bases activations respectively. procedure justified assumption each sample linear...

10.1186/1687-4722-2013-6 article EN cc-by EURASIP Journal on Audio Speech and Music Processing 2013-04-09

Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition.

OPENALEX - Publications

Konstantin Markov Seiichi Nakagawa

In the speaker recognition, when cepstral coefficients are calculated from LPC analysis parameters, prediction error, or residual signal, is usually ignored. However, there an evidence that it contains a specific information. The fundamental frequency of speech signal pitch, which extracted residual, has been used for recognition purposes, but because high intraspeaker variability pitch also often This paper describes our approach to integrating and LPC-residual with LPC-cepstrum in Gaussian...

10.1250/ast.20.281 article EN Journal of the Acoustical Society of Japan (E) 1999-01-01

A statistical lexicon for non-native speech recognition

OPENALEX - Publications

Rainer Gruhn Konstantin Markov Satoshi Nakamura

10.21437/interspeech.2004-566 article EN Interspeech 2022 2004-10-04

State-of-the-art speech recognition technologies for Russian language

OPENALEX - Publications

Daria Vazhenina Irina Kipyatkova Konstantin Markov Alexey Karpov

In this paper, we present a review of the latest developments in Russian speech recognition research. Although underlying technology is mostly language-independent, differences between languages with respect to their structure and grammar have substantial effect on systems performance. The language has complicated word formation system, which characterized by high degree inflection unrigidness order. This greatly reduces predictive power conventional models consequently increases error rate....

10.1145/2160749.2160763 article EN 2012-03-08

Dynamic speech emotion recognition with state-space models

OPENALEX - Publications

Konstantin Markov Tomoko Matsui François Septier Gareth W. Peters

Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human is continuous and time-varying. In this paper, we present a system for dynamic based state-space models (SSMs). The prediction unknown trajectory in space spanned by Arousal, Valence, Dominance (A-V-D) descriptors cast as time series filtering task. state investigated include standard linear model (Kalman filter) well novel non-linear, non-parametric...

10.1109/eusipco.2015.7362750 preprint EN 2015-08-01

Speech recognition system robust to noise and speaking styles

OPENALEX - Publications

Shigeki Matsuda Takatoshi Jitsuhiro Konstantin Markov Satoshi Nakamura

It is difficult to recognize speech distorted by various factors, especially when an ASR system contains only a single acoustic model. One solution use multiple models, one model for each different condition. In this paper, we discuss parallel decoding-based that robust the noise type, SNR, speaker gender and speaking style. Our consists of two recognition channels based on MFCC Differential (DMFCC) features. Each channel has several models depending style, adapted fast adaptation. From...

10.21437/interspeech.2004-726 article EN Interspeech 2022 2004-10-04

Music genre classification using Gaussian Process models

OPENALEX - Publications

Konstantin Markov Tomoko Matsui

In this paper we introduce Gaussian Process (GP) models for music genre classification. Processes are widely used various regression and classification tasks, but there relatively few studies where GPs applied in the audio signal processing systems. The GP non-parametric discriminative classifiers similar to well known SVMs terms of usage. contrast SVMs, however, produce truly probabilistic output allow kernel function parameters be learned from training data. work compare performance as...

10.1109/mlsp.2013.6661991 article EN 2013-09-01

Coming Soon ...