- Speech Recognition and Synthesis
- Speech and Audio Processing
- Emotion and Mood Recognition
- Music and Audio Processing
- Natural Language Processing Techniques
- Hand Gesture Recognition Systems
- Speech and dialogue systems
- Face recognition and analysis
- Robotics and Automated Systems
- Sentiment Analysis and Opinion Mining
- Face and Expression Recognition
- Topic Modeling
- Gaze Tracking and Assistive Technology
- Tactile and Sensory Interactions
- Phonetics and Phonology Research
- Technology and Human Factors in Education and Health
- Hearing Loss and Rehabilitation
- Hearing Impairment and Communication
- Social Robot Interaction and HRI
- Deception detection and forensic psychology
- Video Surveillance and Tracking Methods
- Neural Networks and Applications
- Simulation and Modeling Applications
- Human Pose and Action Recognition
- Infant Health and Development
ITMO University
2014-2025
Photochemistry Center
2021-2024
Russian Academy of Sciences
2008-2024
State Research Center of the Russian Federation
2021-2024
St. Petersburg Institute for Informatics and Automation
2013-2022
Moscow State University
2021
Lomonosov Moscow State University
2021
Moscow State Linguistic University
2021
Gazprom (Russia)
2021
St Petersburg University
2006-2020
Computational Paralinguistics has several unresolved issues, one of which is coping with large variability due to speakers, spoken content and corpora. In this paper, we address the compensation issue by proposing a novel method composed i) Fisher vector encoding low level descriptors extracted from signal, ii) speaker z-normalization applied after clustering iii) non-linear normalization features iv) classification based on Kernel Extreme Learning Machines Partial Least Squares regression....
Call center operators communicate with callers in different emotional states (anger, anxiety, fear, stress, joy, etc.). Sometimes a number of calls coming short period time have to be answered and processed. In the moments when all call are busy, system puts that on hold, regardless its urgency. This research aims improve functionality centers by recognition urgency redistribution queue. It could beneficial for giving health care support elderly people emergency centers. The proposed...
As emotions play a central role in human communication, automatic emotion recognition has attracted increasing attention the last two decades. While multimodal systems enjoy high performances on lab-controlled data, they are still far from providing ecological validity non-lab-controlled, namely “in-the-wild” data. This work investigates audiovisual deep learning approaches to in-the-wild problem. Inspired by outstanding performance of end-to-end and transfer techniques, we explored...
Smart city operation assumes dynamic infrastructure in various aspects. However, organization and process modelling require domain expertise significant efforts from modelers. As a result, such processes are still not well supported by IT systems mostly remain manual tasks. Today, machine learning technologies capable of performing tasks including those that have normally been associated with people; for example, creativeness expertise. Generative adversarial networks (GANs) good example...
This paper introduces a new methodology aimed at comfort for the driver in-the-wild multimodal corpus creation audio-visual speech recognition in monitoring systems. The presented is universal and can be used recording different languages. We present an analysis of systems voice interfaces based on both audio video data. Multimodal allows using data when are useless (e.g. nighttime), as well applying acoustically noisy conditions (e.g., highways). Our identifies main steps requirements...
Cross-language, cross-cultural emotion recognition and accurate prediction of affective disorders are two the major challenges in computing today. In this work, we compare several systems for Detecting Depression with AI Sub-challenge (DDS) Cross-cultural Emotion (CES) that published as part Audio-Visual Challenge (AVEC) 2019. For both sub-challenges, benefit from baselines, while introducing our own features regression models. DDS challenge, where ASR transcripts provided by organizers,...
This paper presents the research and development of prototype assistive mobile information robot (AMIR). The main features presented are voice gesture-based interfaces with Russian speech sign language recognition synthesis techniques a high degree autonomy. AMIR prototype’s aim is to be used as robotic cart for shopping in grocery stores and/or supermarkets. Among topics covered this presentation interface (three modalities), single-handed gesture system (based on collected database...
Multimodal speech and speaker modeling recognition are widely accepted as vital aspects of state the art human-machine interaction systems. While correlations between lip motion well facial expressions studied, relatively little work has been done to investigate gesture. Detection head, hand arm gestures a have studied extensively these were shown carry linguistic information. A typical example is head gesture while saying "yes/no". In this study, correlation investigated. signal analysis,...