- Emotion and Mood Recognition
- Face and Expression Recognition
- Human Pose and Action Recognition
- Face recognition and analysis
- Speech and Audio Processing
- Sentiment Analysis and Opinion Mining
- Video Surveillance and Tracking Methods
- Mental Health via Writing
- Sparse and Compressive Sensing Techniques
- Anomaly Detection Techniques and Applications
- Advanced SAR Imaging Techniques
- Machine Learning and ELM
- Robotics and Automated Systems
- Social Robot Interaction and HRI
- Music and Audio Processing
- Radar Systems and Signal Processing
- Advanced Image and Video Retrieval Techniques
- Identification and Quantification in Food
- Time Series Analysis and Forecasting
- Generative Adversarial Networks and Image Synthesis
- Blind Source Separation Techniques
- Gait Recognition and Analysis
- Autonomous Vehicle Technology and Safety
- Image Retrieval and Classification Techniques
- Complex Systems and Time Series Analysis
Vrije Universiteit Brussel
2012-2022
Northwestern Polytechnical University
2015
This paper addresses multi-modal depression analysis. We propose a fusion framework composed of deep convolutional neural network (DCNN) and (DNN) models. Our considers audio, video text streams. For each modality, handcrafted feature descriptors are input into DCNN to learn high-level global features with compact dynamic information, then the learned fed DNN predict PHQ-8 scores. fusion, estimated scores from three modalities integrated in obtain final score. Moreover, this work, we new for...
In order to improve the recognition accuracy of Depression Classification Sub-Challenge (DCC) AVEC 2016, in this paper we propose a decision tree for depression classification. The is constructed according distribution multimodal prediction PHQ-8 scores and participants' characteristics (PTSD/Depression Diagnostic, sleep-status, feeling personality) obtained via analysis transcript files participants. proposed gender specific provides way fusing upper level language information with results...
In this paper, we design a hybrid depression classification and estimation framework from audio, video text descriptors. It contains three main components: 1) Deep Convolutional Neural Network (DCNN) (DNN) based audio visual multi-modal recognition frameworks, trained with depressed not-depressed participants, respectively; 2) Paragraph Vector (PV), Support Machine (SVM) Random Forest the interview transcripts; 3) A multivariate regression model fusing PHQ-8 estimations DCNN-DNN models,...
In this paper we propose a novel framework to process Doppler-radar signals for hand gesture recognition. sensors provide many advantages over other emerging sensing modalities, including low development costs and high sensitivity capture subtle gestures with precision. Furthermore, they have attractive properties ubiquitous deployment can be conveniently embedded into different devices. scope, current recognition methods still rely in deep CNN-LSTM 3D structures that require sufficient...
This paper targets the Bipolar Disorder Challenge (BDC) task of Audio Visual Emotion (AVEC) 2018. Firstly, two novel features are proposed: 1) a histogram based arousal feature, in which continuous values estimated from audio cues by Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) model; 2) Histogram Displacement (HDR) upper body posture characterizes displacement and velocity key points video segment. In addition, we propose multi-stream bipolar disorder classification framework...
The radar is expected to go beyond the traditional functionality of range and speed estimation target classification. complementary use video becoming increasingly popular for applications such as autonomous cars, smart home automation etc. Target classification based on depends characteristic motion patterns nonrigidities. Micro-Doppler (MD) signal captures motions that have been used extract reliable distinguishing features various classes targets. Popular MD analysis techniques Cadence...
Understanding human-contextual interaction to predict human trajectories is a challenging problem. Most of previous trajectory prediction approaches focused on modeling the human-human located in near neighborhood and neglected influence individuals which are farther scene as well layout. To alleviate these limitations, this article we propose model address pedestrian using latent variable aware interaction. Our proposal relies contextual information that influences pedestrians encode We...
Automated facial expression analysis from image sequences for continuous emotion recognition is a very challenging task due to the loss of three-dimensional information during formation process. State-of-the-art relied on estimating dynamic textures features and convolutional neural network derive spatio-temporal features. Despite their great success, such are insensitive micro muscle deformations affected by identity, face pose, illumination variation, self-occlusion. In this work, we argue...
In this paper, we present a novel strategy to combine set of compact descriptors leverage an associated recognition task. We formulate the problem from multiple kernel learning (MKL) perspective and solve it following stochastic variance reduced gradient (SVRG) approach address its scalability, currently open issue. MKL models are ideal candidates jointly learn optimal combination features along with predictor. However, they unable scale beyond dozen thousand samples due high computational...
Our study aims to investigate the interdependence between international stock markets and sentiments from financial news in forecasting. We adopt Temporal Fusion Transformers (TFT) incorporate intra inter-market correlations interaction information flow, i.e. causality, of sentiment dynamics market. The current distinguishes itself existing research by adopting Dynamic Transfer Entropy (DTE) establish an accurate flow propagation sentiments. DTE has advantage providing time series that mine...
Estimating a person's affective state from facial information is an essential capability for social interaction. Automatizing such has therefore increasingly driven multidisciplinary research the past decades. At heart of this issue are very challenging signal processing and artificial intelligence problems by inherent complexity human affect. We propose principled framework designing automated systems capable continuously estimating incoming stream images. First, we model affect as...
Continuous affective state estimation from facial information is a task which requires the prediction of time series emotional outputs image sequence. Modeling spatial-temporal evolution plays an important role in estimation. One most widely used methods Recurrent Neural Networks (RNN). RNNs provide attractive framework for propagating over sequence using continuous-valued hidden layer representation. In this work, we propose to instead learn rich dynamics. We model human affect as dynamical...
Continuous affect estimation from facial expressions has attracted increased attention in the affective computing research community. This paper presents a principled framework for estimating continuous video sequences. Based on recent developments, we address problem of by leveraging Bayesian filtering paradigm, i.e., considering as latent dynamical system corresponding to general feeling pleasure with degree arousal, and recursively its state using sequence visual observations. To this...
This work proposes to solve the problem of few-shot biometric authentication by computing Mahalanobis distance between testing embeddings and a multivariate Gaussian distribution training obtained using pre-trained CNNs. Experimental results show that models on ImageNet dataset significantly outperform human faces. With VGG16 model, we obtain FRR 1.25% for FAR 1.18% 20 cattle identities.
The omnipresence of deep learning architectures such as convolutional neural networks (CNN)s is fueled by the synergistic combination ever-increasing labeled datasets and specialized hardware. Despite indisputable success, reliance on huge amounts data hardware can be a limiting factor when approaching new applications. To help alleviating these limitations, we propose an efficient strategy for layer-wise unsupervised training CNNs conventional in acceptable time. Our proposed consists...
Understanding social signals is a very important aspect of human communication and interaction has therefore attracted increased attention from various research areas. Among the different types signals, particular been paid to facial expression emotions its automated analysis image sequences. Automated challenging task due complex three-dimensional deformation motion face associated expressions loss 3D information during formation process. As consequence, retrieving spatio-temporal sequences...
We present a framework for combination aware AU intensity recognition. It includes feature extraction approach that can handle small head movements which does not require face alignment. A three layered structure is used the classification. The first layer dedicated to independent recognition, and second incorporates knowledge. At third layer, dynamics are handled based on variable duration semi-Markov model. two layers modeled using extreme learning machines (ELMs). ELMs have equal...
Abstract Recent advances in generative adversarial networks (GANs) have shown tremendous success for facial expression generation tasks. However, generating vivid and expressive expressions at Action Units (AUs) level is still challenging, due to the fact that automatic analysis AU intensity itself an unsolved difficult task. In this paper, we propose a novel synthesis‐by‐analysis approach by leveraging power of GAN framework state‐of‐the‐art detection model achieve better results AU‐driven...
In this paper, we address the problem of neural architecture search (NAS) in a context where optimality policy is driven by black-box Oracle <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}$</tex> with unknown form and derivatives. scenario, xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}(A_{C})$</tex> typically provides readings from set sensors on how network xmlns:xlink="http://www.w3.org/1999/xlink">$A_{C}$</tex> fares...