- Emotion and Mood Recognition
- Speech and Audio Processing
- Mental Health via Writing
- Face and Expression Recognition
- Sentiment Analysis and Opinion Mining
- Face recognition and analysis
- Color perception and design
- EEG and Brain-Computer Interfaces
- Music and Audio Processing
- Human Pose and Action Recognition
- Blind Source Separation Techniques
- Multisensory perception and integration
- Olfactory and Sensory Function Studies
- Neural Networks and Applications
- Visual Attention and Saliency Detection
- Neural dynamics and brain function
- Hand Gesture Recognition Systems
- Advanced Neural Network Applications
- Infant Health and Development
- Gaussian Processes and Bayesian Inference
- Video Surveillance and Tracking Methods
- Machine Learning and ELM
- Adversarial Robustness in Machine Learning
- ECG Monitoring and Analysis
- Generative Adversarial Networks and Image Synthesis
Xi’an University of Posts and Telecommunications
2022-2025
Taizhou University
2024
Halmstad University
2024
Shenzhen University
2024
Shaanxi University of Science and Technology
2024
Northwestern Polytechnical University
2014-2022
Rochester Institute of Technology
2020
Vrije Universiteit Brussel
2019
This paper addresses multi-modal depression analysis. We propose a fusion framework composed of deep convolutional neural network (DCNN) and (DNN) models. Our considers audio, video text streams. For each modality, handcrafted feature descriptors are input into DCNN to learn high-level global features with compact dynamic information, then the learned fed DNN predict PHQ-8 scores. fusion, estimated scores from three modalities integrated in obtain final score. Moreover, this work, we new for...
This paper presents our system design for the Audio-Visual Emotion Challenge ($AV^{+}EC$ 2015). Besides baseline features, we extract from audio functionals on low-level descriptors (LLDs) obtained via YAAFE toolbox, and video Local Phase Quantization Three Orthogonal Planes (LPQ-TOP) features. From physiological signals, 52 electro-cardiogram (ECG) features 22 electro-dermal activity (EDA) various analysis domains. The extracted along with $AV^{+}EC$ 2015 of audio, ECG or EDA are...
In order to improve the recognition accuracy of Depression Classification Sub-Challenge (DCC) AVEC 2016, in this paper we propose a decision tree for depression classification. The is constructed according distribution multimodal prediction PHQ-8 scores and participants' characteristics (PTSD/Depression Diagnostic, sleep-status, feeling personality) obtained via analysis transcript files participants. proposed gender specific provides way fusing upper level language information with results...
In this paper, we design a hybrid depression classification and estimation framework from audio, video text descriptors. It contains three main components: 1) Deep Convolutional Neural Network (DCNN) (DNN) based audio visual multi-modal recognition frameworks, trained with depressed not-depressed participants, respectively; 2) Paragraph Vector (PV), Support Machine (SVM) Random Forest the interview transcripts; 3) A multivariate regression model fusing PHQ-8 estimations DCNN-DNN models,...
ABSTRACT Facial expression recognition (FER) is significant in many application scenarios, such as driving scenarios with very different lighting conditions between day and night. Existing methods primarily focus on eliminating the negative effects of pose identity information FER, but overlook challenges posed by variations. So, this work proposes an efficient illumination‐invariant dynamic FER method. To augment robustness to illumination variance, contrast normalisation introduced form a...
In this paper we propose the deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN) based single modal and multi-modal affect recognition frameworks. framework DBLSTM with moving average (MA), audio or visual features are input into DBLSTM-RNN model, whose output estimations of a dimension smoothed by filter. After expanded to frame rate ground truth labels, another MA is adopted for smoothing final results. DBLSTM-DBLSTM-MA, initial from modalities via first layer...
In this paper, we propose a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and multiple kernel learning (MKL) based multi-modal affect recognition scheme (LSTM-MKL). It takes the LSTM-RNN advantage to model long range dependencies between successive observations, uses MKL power non-linear correlations inputs outputs. For each of dimensions (arousal, valence, expectancy, power), two models are trained, one for modality. phase, audio visual features input corresponding learned LSTM...
Automated facial expression analysis from image sequences for continuous emotion recognition is a very challenging task due to the loss of three-dimensional information during formation process. State-of-the-art relied on estimating dynamic textures features and convolutional neural network derive spatio-temporal features. Despite their great success, such are insensitive micro muscle deformations affected by identity, face pose, illumination variation, self-occlusion. In this work, we argue...
Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus. In this paper, we propose novel Shallow-Deep Attention-based Network (SDANet) classify the correct auditory stimulus evoking EEG signal. It adopts Correlation Module (ACM) discover connection between speech and from global aspect, Similarity Classification (SDSCM) decide classification result via embeddings learned shallow deep layers. Moreover, various training strategies data...
Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic work performance, overall quality life.Many researchers within the field affective computing are adopting deep learning technology to explore potential patterns related detection depression.However, because subjects' privacy protection concerns, that data in this area is still scarce, presenting a challenge for discriminative models used detecting depression.To...
Continuous affective state estimation from facial information is a task which requires the prediction of time series emotional outputs image sequence. Modeling spatial-temporal evolution plays an important role in estimation. One most widely used methods Recurrent Neural Networks (RNN). RNNs provide attractive framework for propagating over sequence using continuous-valued hidden layer representation. In this work, we propose to instead learn rich dynamics. We model human affect as dynamical...
Continuous affect estimation from facial expressions has attracted increased attention in the affective computing research community. This paper presents a principled framework for estimating continuous video sequences. Based on recent developments, we address problem of by leveraging Bayesian filtering paradigm, i.e., considering as latent dynamical system corresponding to general feeling pleasure with degree arousal, and recursively its state using sequence visual observations. To this...
Abstract Recent advances in generative adversarial networks (GANs) have shown tremendous success for facial expression generation tasks. However, generating vivid and expressive expressions at Action Units (AUs) level is still challenging, due to the fact that automatic analysis AU intensity itself an unsolved difficult task. In this paper, we propose a novel synthesis‐by‐analysis approach by leveraging power of GAN framework state‐of‐the‐art detection model achieve better results AU‐driven...
Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic work performance, overall quality life. Many researchers within the field affective computing are adopting deep learning technology to explore potential patterns related detection depression. However, because subjects' privacy protection concerns, that data in this area is still scarce, presenting a challenge for discriminative models used detecting To navigate...
In this paper, we address the problem of neural architecture search (NAS) in a context where optimality policy is driven by black-box Oracle <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}$</tex> with unknown form and derivatives. scenario, xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}(A_{C})$</tex> typically provides readings from set sensors on how network xmlns:xlink="http://www.w3.org/1999/xlink">$A_{C}$</tex> fares...
In this paper, we propose the deep neural network - switching Kalman filter (DNN-SKF) based frameworks for both single modal and multi-modal continuous affective dimension estimation. The DNN-SKF framework firstly models complex nonlinear relationship between input (audio, visual, or lexical) features dimensions via non-recurrent DNN, then temporal dynamics embedded in emotions segmental linear SKF. Affective estimation experiments are carried out on Audio Visual Emotion Challenge (AVEC2012)...
Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus. In this paper, we propose novel Shallow-Deep Attention-based Network (SDANet) classify the correct auditory stimulus evoking EEG signal. It adopts Correlation Module (ACM) discover connection between speech and from global aspect, Similarity Classification (SDSCM) decide classification result via embeddings learned shallow deep layers. Moreover, various training strategies data...
In this paper, the flexibility, versatility and predictive power of kernel regression are combined with now lavishly available network data to create models even greater performances. Building from previous work featuring generalized linear built in presence cohesion data, we construct a kernelized extension that captures subtler nonlinearities extremely high dimensional spaces also produces far better Applications seamless yet substantial adaptation simulated real-life demonstrate appeal...
The Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive dynamic scene, which is significant and imperative in many vision tasks. Most of existing methods only consider visual cues, while neglect accompanied audio information, can provide complementary information for scene understanding. In fact, there exists a strong relation between auditory humans generally surrounding by collaboratively sensing these cues. Motivated this, an audio-visual...