- Emotion and Mood Recognition
- Face and Expression Recognition
- Face recognition and analysis
- Video Analysis and Summarization
- Music and Audio Processing
- Color perception and design
- Image Retrieval and Classification Techniques
- EEG and Brain-Computer Interfaces
- Gaze Tracking and Assistive Technology
- Generative Adversarial Networks and Image Synthesis
- Biometric Identification and Security
- Human Pose and Action Recognition
- Text and Document Classification Technologies
- Speech and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Music Technology and Sound Studies
- Anomaly Detection Techniques and Applications
- Sentiment Analysis and Opinion Mining
- Neuroscience and Music Perception
- Hand Gesture Recognition Systems
- Visual Attention and Saliency Detection
- Spam and Phishing Detection
- Image and Video Quality Assessment
- Speech and dialogue systems
- Video Surveillance and Tracking Methods
University of Science and Technology of China
2016-2025
National Taiwan University of Science and Technology
2022-2023
National Science Center
2022
Institute of Art
2022
China National Heavy Duty Truck Group (China)
2022
Northeastern University
2022
Wuhu Hit Robot Technology Research Institute
2021
Dalian University of Technology
2018-2019
Beijing Normal University
2011
Kyushu University
2005-2007
To date, most facial expression analysis has been based on visible and posed databases. Visible images, however, are easily affected by illumination variations, while expressions differ in appearance timing from natural ones. In this paper, we propose establish a infrared database, which contains both spontaneous of more than 100 subjects, recorded simultaneously an thermal camera, with provided three different directions. The database includes the apex expressional images without glasses....
The tracking and recognition of facial activities from images or videos have attracted great attention in computer vision field. Facial are characterized by three levels. First, the bottom level, feature points around each component, i.e., eyebrow, mouth, etc., capture detailed face shape information. Second, middle action units, defined coding system, represent contraction a specific set muscles, lid tightener, eyebrow raiser, etc. Finally, top six prototypical expressions global muscle...
Video affective content analysis has been an active research area in recent decades, since emotion is important component the classification and retrieval of videos. can be divided into two approaches: direct implicit. Direct approaches infer videos directly from related audiovisual features. Implicit approaches, on other hand, detect based automatic a user's spontaneous response while consuming This paper first proposes general framework for video analysis, which includes content, emotional...
Spatial-temporal relations among facial muscles carry crucial information about expressions yet have not been thoroughly exploited. One contributing factor for this is the limited ability of current dynamic models in capturing complex spatial and temporal relations. Existing can only capture simple local sequential events, or lack incorporating uncertainties. To overcome these limitations take full advantage spatio-temporal information, we propose to model expression as a activity that...
In this paper we tackle the problem of facial action unit (AU) recognition by exploiting complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards goal, build a hierarchical model that combines bottom-level image features and top-level AU to jointly recognize AUs in principled manner. The proposed has two major advantages over existing methods. 1) Unlike methods can only capture local pair-wise dependencies, our is...
Previous studies on facial expression analysis have been focused recognizing basic categories. There is limited amount of work the continuous intensity estimation, which important for detecting and tracking emotion change. Part reason lack labeled data with annotated since annotation requires expertise time consuming. In this work, we treat estimation as a regression problem. By taking advantage natural onset-apex-offset evolution pattern expression, proposed method can handle different...
In multi-label learning, each sample can be assigned to multiple class labels simultaneously. this work, we focus on the problem of learning with missing (MLML), where instead assuming a complete label assignment is provided for sample, only partial are values, while rest or not provided. The positive (presence), negative (absence) and explicitly distinguished in MLML. We formulate MLML as transductive problem, goal recover full by enforcing consistency available assignments smoothness...
Existing facial expression recognition methods either focus on pose variations or identity bias, but not both simultaneously. This paper proposes an adversarial feature learning method to address of these issues. Specifically, the proposed consists five components: encoder, classifier, a discriminator, subject and generator. An encoder extracts representations, classifier tries perform using extracted representations. The are trained collaboratively, so that representations discriminative...
In this paper, we propose a novel approach of occluded facial expression recognition under the help non-occluded images. The images are used as privileged information, which is only required during training, but not testing. Specifically, two deep neural networks first trained from and respectively. Then network fixed to guide fine-tuning both label space feature space. Similarity constraint loss inequality regularization imposed make output converge that network. Adversarial leaning adopted...
We propose a multi-modal method with hierarchical recurrent neural structure to integrate vision, audio and text features for depression detection. Such contains two hierarchies of bidirectional long short term memories fuse predict the severity depression. An adaptive sample weighting mechanism is introduced adapt diversity training samples. Experiments on testing set detection challenge demonstrate effectiveness proposed method.
The wide popularity of digital photography and social networks has generated a rapidly growing volume multimedia data (i.e., image, music, video), resulting in great demand for managing, retrieving, understanding these data. Affective computing (AC) can help to understand human behaviors enable applications. In this article, we survey the state-of-the-art AC technologies comprehensively large-scale heterogeneous We begin by introducing typical emotion representation models from psychology...
As one of the most important forms psychological behaviors, micro-expression can reveal real emotion. However, existing labeled samples are limited to train a high performance classifier. Since and macro-expression share some similarities in facial muscle movements texture changes, this paper we propose recognition framework that leverages as guidance. Specifically, first introduce two Expression-Identity Disentangle Network, named MicroNet MacroNet, feature extractor disentangle...
Creating a large and natural facial expression database is prerequisite for analysis classification. It is, however, not only time consuming but also difficult to capture an adequately number of spontaneous images their meanings because no standard, uniform, exact measurements are available collection annotation. Thus, comprehensive first-hand data analyses may provide insight future research on construction, recognition, emotion inference. This paper presents our multimodal visible infrared...
Current works on facial action unit (AU) recognition typically require fully AU-annotated images for supervised AU classifier training. annotation is a time-consuming, expensive, and error-prone process. While AUs are hard to annotate, expression relatively easy label. Furthermore, there exist strong probabilistic dependencies between expressions as well among AUs. Such referred domain knowledge. In this paper, we propose novel method that learns classifiers from knowledge...
Visible facial images provide geometric and appearance patterns of expressions are sensitive to illumination changes. Thermal record temperature distribution robust light conditions. Therefore, expression recognition is enhanced by visible thermal image fusion. In most cases, only available due the widespread popularity cameras high cost cameras. Thus, we propose a novel method using infrared (IR) data as privileged information, which during training. Specifically, first learn deep model for...
The inherent connections among aesthetic attributes and aesthetics are crucial for image assessment, but have not been thoroughly explored yet. In this paper, we propose a novel assessment assisted by through both representation-level label-level. used as privileged information, which is only required during training. Specifically, first multitask deep convolutional rating network to learn the score simultaneously. construct better feature representations multi-task learning. After that,...
Facial action unit (AU) recognition is formulated as a supervised learning problem by recent works. However, the complex labeling process makes it challenging to provide AU annotations for large amounts of facial images. To remedy this, we utilize rules defined Action Coding System (FACS) design novel knowledge-driven self-supervised representation framework recognition. The encoder trained using images without annotations. are summarized from FACS partition manners and determine...
Facial expression recognition from thermal infrared images has attracted more and attentions in recent years. However, the features adopted current work are either temperature statistical parameters extracted facial regions of interest or several hand-crafted that commonly used visible spectrum. Till now there is no image specially defined for images. In this paper, we first to propose using Deep Boltzmann Machine learn long wavelength First, face located normalized Then, a model composed...
In this article, we propose a novel approach to recognize emotions with the help of privileged information, which is only available during training, but not testing. Such additional information can be exploited training construct better classifier. Specifically, audience's emotion from EEG signals stimulus videos, and tag videos' aid electroencephalogram (EEG) signals. First, frequency features are extracted audio/visual video stimulus. Second, selected by statistical tests. Third, new...