- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Advanced Graph Neural Networks
- Video Surveillance and Tracking Methods
- Recommender Systems and Techniques
- Multimodal Machine Learning Applications
- Gait Recognition and Analysis
- Face and Expression Recognition
- Topic Modeling
- Context-Aware Activity Recognition Systems
- Advanced Neural Network Applications
- Sentiment Analysis and Opinion Mining
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Domain Adaptation and Few-Shot Learning
- Image and Video Stabilization
- Plasmonic and Surface Plasmon Research
- Thermal Radiation and Cooling Technologies
- Computer Graphics and Visualization Techniques
- 3D Shape Modeling and Analysis
- Semantic Web and Ontologies
- Expert finding and Q&A systems
- Text and Document Classification Technologies
- Fire Detection and Safety Systems
Beijing University of Technology
2021-2024
Beijing University of Posts and Telecommunications
2020-2023
Abstract Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance image understanding. However, underexplored task of IEC presents three major challenges: a tremendous training objective gap between and IEC, shared suboptimal prompts, invariant prompts for all instances. In this study, we propose general framework that...
Weakly supervised group activity recognition deals with the dependence on individual-level annotations during understanding scenes involving multiple individuals, which is a challenging task. Existing methods either take trained detectors to extract individual features or utilize attention mechanisms for partial context encoding, followed by integration form final group-level representations. However, require training phase and have mis-detection issue, contexts extracted immediately from...
Image Emotion Classification (IEC) aims to extract abstract emotions evoked in images. The language-supervised method has recently shown superior power image understanding, e.g., CLIP. However, the underexplored IEC task three significant challenges: tremendous training objective gap between pre-training and IEC, shared suboptimal invariant prompts for all instances. In this paper, we propose a general framework that shows how CLIP can be effectively exploited on task. We first introduce...
Image emotion classification is an important computer vision task to extract emotions from images. The methods for image (IEC) are primarily based on label or distribution as a supervision signal, which neither has enough accessibility nor diversity, limiting the development of IEC research. Inspired by psychology research and recent booming large-scale pretrained language models. We figure out language-supervised paradigm, can cleverly combine features visual drive model gain stronger...
Previous works build interest learning via mining deeply on interactions. However, the interactions come incomplete and insufficient to support modeling, even bringing severe bias into recommendations. To address interaction sparsity consequent challenges, we propose a graph contrastive complementary embedding (GCCE), which introduces negative interests assist positive of for modeling. embed interest, design perturbed convolution by preventing distribution from bias. Since samples are not...
Group activity recognition aims to recognize behaviors characterized by multiple individuals within a scene. Existing schemes rely on individual relation inference and usually take the as tokens. Essentially they select most relevant region of group from entire image while filtering out irrelevant background noises. However, these require bounding box labeling in both training testing stages. Since have been presented at one scale, multi-scale cannot be combined an effective way. In this...
Abstract Motion information has been widely exploited for group activity recognition in sports video. However, order to model and extract the various motion between adjacent frames, existing algorithms only use coarse video‐level labels as supervision cues. This may lead ambiguity of extracted features omission changing rules patterns that are also important video recognition. In this paper, a latent label mining strategy basketball videos is proposed. The authors' novel allows them obtain...
Group activity recognition that infers the of a group people is challenging task and has received great deal interest in recent years. Different from individual action recognition, needs to model not only visual cues individuals but also relationships between them. The existing approaches inferred relations based on holistic features individual. However, parts human body, such as head, hands, legs, their relationships, are critical most activities. In this paper, we establish part-based...
In this work, we study the problem of separating global camera motion and local dynamic from an optical flow. Previous methods either estimate motions by a parametric model, such as homography, or both them flow field. However, none these can directly through end-to-end manner. addition, two accurately hybrid field is challenging. Because one easily confuse other when they are compounded together. To end, propose estimation network GLM-Net. We design encoder-decoder structures for separation...
Group activity recognition is a challenging task that involves multiple moving actors within cluttered scene. Existing methods often rely on object detector to avoid individual bounding box labeling during testing, but are prone false detections due factors such as occlusion and background clutter. In addition, existing detector-free method based Transformer attends attention map too sparse, resulting in the loss of some important foreground information. this paper, we introduce...
Group activity recognition is a subject with broad applications, and its main challenge to model the interactions between individuals. Existing algorithms mostly merely based on holistic features of persons, which completely ignore local details that could be significant for recognition. In this paper, we propose novel part interaction learning algorithm group Our proposed introduces both physical structural information fine-grained contextual into representations, through exploring intraand...
L<sub>1</sub> loss function and Intersection over Union (IoU) are commonly used in object detection. However, minimizing the through training process does not necessarily amount to maximizing IoUs. simply assigns equal weights difference of width, height, center point between a prediction box ground truth but pays less attention contribution each shape property. Observing this, we propose scaling which can be easily embedded convolutional neural networks for mitigating gap IoU function. The...
Recently, recommender system suffers extremely from both interaction bias and sparsity. The conventional unified embedding learning policies fail to consider the imbalanced issue produce suboptimal representations of users items for recommendation. Towards end, this work dedicates bias-aware in a decomposed manner proposes Counterfactual Embedding Learning (CEL) debiased Instead debiasing with sampling uniform interactions, we follow capitalize natural distribution model counterfactual...