NFDI4DS | UHH-SEMS - Publication Details

Alexey Karpov

ORCID: 0000-0003-3424-652X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5032870944

Research Areas

Speech Recognition and Synthesis
Speech and Audio Processing
Emotion and Mood Recognition
Music and Audio Processing
Natural Language Processing Techniques
Hand Gesture Recognition Systems
Speech and dialogue systems
Face recognition and analysis
Robotics and Automated Systems
Sentiment Analysis and Opinion Mining
Face and Expression Recognition
Topic Modeling
Gaze Tracking and Assistive Technology
Tactile and Sensory Interactions
Phonetics and Phonology Research
Technology and Human Factors in Education and Health
Hearing Loss and Rehabilitation
Hearing Impairment and Communication
Social Robot Interaction and HRI
Deception detection and forensic psychology
Video Surveillance and Tracking Methods
Neural Networks and Applications
Simulation and Modeling Applications
Human Pose and Action Recognition
Infant Health and Development

ITMO University
2014-2025

Photochemistry Center
2021-2024

Russian Academy of Sciences
2008-2024

State Research Center of the Russian Federation
2021-2024

St. Petersburg Institute for Informatics and Automation
2013-2022

Moscow State University
2021

Lomonosov Moscow State University
2021

Moscow State Linguistic University
2021

Gazprom (Russia)
2021

St Petersburg University
2006-2020

Automatic speech recognition for under-resourced languages: A survey

OPENALEX - Publications

Laurent Besacier Etienne Barnard Alexey Karpov Tanja Schultz

10.1016/j.specom.2013.07.008 article EN Speech Communication 2013-08-07

In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study

OPENALEX - Publications

Elena Ryumina Denis Dresvyanskiy Alexey Karpov

10.1016/j.neucom.2022.10.013 article EN Neurocomputing 2022-10-07

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

OPENALEX - Publications

Dmitry Ryumin Alexandr Axyonov Elena Ryumina Denis Ivanko Alexey Kashevnik and 1 more

10.1016/j.eswa.2024.124159 article EN Expert Systems with Applications 2024-05-09

Efficient and effective strategies for cross-corpus acoustic emotion recognition

OPENALEX - Publications

Heysem Kaya Alexey Karpov

10.1016/j.neucom.2017.09.049 article EN Neurocomputing 2017-09-23

Multimodal Interfaces of Human–Computer Interaction

OPENALEX - Publications

Alexey Karpov Rafael Yusupov

10.1134/s1019331618010094 article EN Herald of the Russian Academy of Sciences 2018-01-01

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

OPENALEX - Publications

Alexey Karpov Konstantin Markov Irina Kipyatkova Daria Vazhenina Andrey Ronzhin

10.1016/j.specom.2013.07.004 article EN Speech Communication 2013-07-24

Fisher vectors with cascaded normalization for paralinguistic analysis

OPENALEX - Publications

Heysem Kaya Alexey Karpov Albert Ali Salah

Computational Paralinguistics has several unresolved issues, one of which is coping with large variability due to speakers, spoken content and corpora. In this paper, we address the compensation issue by proposing a novel method composed i) Fisher vector encoding low level descriptors extracted from signal, ii) speaker z-normalization applied after clustering iii) non-linear normalization features iv) classification based on Kernel Extreme Learning Machines Partial Least Squares regression....

10.21437/interspeech.2015-193 article EN Interspeech 2022 2015-09-06

Emotion, age, and gender classification in children’s speech by humans and machines

OPENALEX - Publications

Heysem Kaya Albert Ali Salah Alexey Karpov Оlga Frolova Aleksey Grigorev and 1 more

10.1016/j.csl.2017.06.002 article EN Computer Speech & Language 2017-06-14

Call Redistribution for a Call Center Based on Speech Emotion Recognition

OPENALEX - Publications

Milana Bojanić Vlado Delić Alexey Karpov

Call center operators communicate with callers in different emotional states (anger, anxiety, fear, stress, joy, etc.). Sometimes a number of calls coming short period time have to be answered and processed. In the moments when all call are busy, system puts that on hold, regardless its urgency. This research aims improve functionality centers by recognition urgency redistribution queue. It could beneficial for giving health care support elderly people emergency centers. The proposed...

10.3390/app10134653 article EN cc-by Applied Sciences 2020-07-06

End-to-End Modeling and Transfer Learning for Audiovisual Emotion Recognition in-the-Wild

OPENALEX - Publications

Denis Dresvyanskiy Elena Ryumina Heysem Kaya Maxim Markitantov Alexey Karpov and 1 more

As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity non-lab-controlled, namely “in-the-wild” data. This work investigates audiovisual deep learning approaches to in-the-wild problem. Inspired by outstanding performance of end-to-end and transfer techniques, we explored...

10.3390/mti6020011 article EN cc-by Multimodal Technologies and Interaction 2022-01-27

Multi-corpus emotion recognition method based on cross-modal gated attention fusion

OPENALEX - Publications

Elena Ryumina Dmitry Ryumin Alexandr Axyonov Denis Ivanko Alexey Karpov

10.1016/j.patrec.2025.02.024 article EN Pattern Recognition Letters 2025-02-01

Generative Adversarial Framework with Composite Discriminator for Organization and Process Modelling—Smart City Cases

OPENALEX - Publications

Nikolay Shilov Andrew Ponomarev Dmitry Ryumin Alexey Karpov

Smart city operation assumes dynamic infrastructure in various aspects. However, organization and process modelling require domain expertise significant efforts from modelers. As a result, such processes are still not well supported by IT systems mostly remain manual tasks. Today, machine learning technologies capable of performing tasks including those that have normally been associated with people; for example, creativeness expertise. Generative adversarial networks (GANs) good example...

10.3390/smartcities8020038 article EN cc-by Smart Cities 2025-02-28

Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin

OPENALEX - Publications

Alexey Kashevnik Igor Lashkov Alexandr Axyonov Denis Ivanko Dmitry Ryumin and 2 more

This paper introduces a new methodology aimed at comfort for the driver in-the-wild multimodal corpus creation audio-visual speech recognition in monitoring systems. The presented is universal and can be used recording different languages. We present an analysis of systems voice interfaces based on both audio video data. Multimodal allows using data when are useless (e.g. nighttime), as well applying acoustically noisy conditions (e.g., highways). Our identifies main steps requirements...

10.1109/access.2021.3062752 article EN cc-by IEEE Access 2021-01-01

Predicting Depression and Emotions in the Cross-roads of Cultures, Para-linguistics, and Non-linguistics

OPENALEX - Publications

Heysem Kaya Dmitrii Fedotov Denis Dresvyanskiy Metehan Doyran Danila Mamontov and 5 more

Cross-language, cross-cultural emotion recognition and accurate prediction of affective disorders are two the major challenges in computing today. In this work, we compare several systems for Detecting Depression with AI Sub-challenge (DDS) Cross-cultural Emotion (CES) that published as part Audio-Visual Challenge (AVEC) 2019. For both sub-challenges, benefit from baselines, while introducing our own features regression models. DDS challenge, where ASR transcripts provided by organizers,...

10.1145/3347320.3357691 article EN 2019-10-15

A Multimodal User Interface for an Assistive Robotic Shopping Cart

OPENALEX - Publications

Dmitry Ryumin Ildar Kagirov Alexandr Axyonov Nikita Pavlyuk Anton Saveliev and 4 more

This paper presents the research and development of prototype assistive mobile information robot (AMIR). The main features presented are voice gesture-based interfaces with Russian speech sign language recognition synthesis techniques a high degree autonomy. AMIR prototype’s aim is to be used as robotic cart for shopping in grocery stores and/or supermarkets. Among topics covered this presentation interface (three modalities), single-handed gesture system (based on collected database...

10.3390/electronics9122093 article EN Electronics 2020-12-08

Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis

OPENALEX - Publications

Alexey Karpov Irina Kipyatkova Andrey Ronzhin

10.21437/interspeech.2011-791 article EN Interspeech 2022 2011-08-27

Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks

OPENALEX - Publications

Heysem Kaya Alexey Karpov

10.21437/interspeech.2016-995 article EN Interspeech 2022 2016-08-29

Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

OPENALEX - Publications

Maxim Markitantov Denis Dresvyanskiy Danila Mamontov Heysem Kaya Wolfgang Minker and 1 more

10.21437/interspeech.2020-2666 article EN Interspeech 2022 2020-10-25

Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis

OPENALEX - Publications

Mehmet Emre Sargin Oya Aran Alexey Karpov Ferda Ofli Yelena Yasinnik and 4 more

Multimodal speech and speaker modeling recognition are widely accepted as vital aspects of state the art human-machine interaction systems. While correlations between lip motion well facial expressions studied, relatively little work has been done to investigate gesture. Detection head, hand arm gestures a have studied extensively these were shown carry linguistic information. A typical example is head gesture while saying "yes/no". In this study, correlation investigated. signal analysis,...

10.1109/icme.2006.262663 article EN 2006-07-01

Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold

OPENALEX - Publications

Heysem Kaya Alexey Karpov

10.21437/interspeech.2017-653 article EN Interspeech 2022 2017-08-16

Multimodal speech recognition: increasing accuracy using high speed video data

OPENALEX - Publications

Denis Ivanko Alexey Karpov Dmitrii Fedotov Irina Kipyatkova Dmitry Ryumin and 3 more

10.1007/s12193-018-0267-1 article EN Journal on Multimodal User Interfaces 2018-08-01

Coming Soon ...