Xinji Mai

ORCID: 0009-0003-4596-5391
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Emotion and Mood Recognition
  • Face and Expression Recognition
  • Image Processing Techniques and Applications
  • Advanced Image Processing Techniques
  • Face recognition and analysis
  • Advanced Image Fusion Techniques
  • Network Security and Intrusion Detection
  • Cognitive Abilities and Testing
  • Brain Tumor Detection and Classification
  • Optical measurement and interference techniques
  • Anomaly Detection Techniques and Applications
  • EEG and Brain-Computer Interfaces
  • Speech and Audio Processing
  • Blind Source Separation Techniques
  • Integrated Circuits and Semiconductor Failure Analysis
  • Advanced Vision and Imaging
  • Multi-Agent Systems and Negotiation
  • Neural Networks and Applications
  • Action Observation and Synchronization
  • Speech and dialogue systems
  • Semantic Web and Ontologies
  • Advanced Malware Detection Techniques
  • Image and Signal Denoising Methods

Fudan University
2023-2024

Anomaly detection is critical in industrial manufacturing for ensuring product quality and improving efficiency automated processes. The scarcity of anomalous samples limits traditional methods, making anomaly generation essential expanding the data repository. However, recent generative models often produce unrealistic anomalies increasing false positives, or require real-world training. In this work, we treat as a compositional problem propose ComGEN, component-aware unsupervised framework...

10.48550/arxiv.2502.11712 preprint EN arXiv (Cornell University) 2025-02-17

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing methods tend to consider as noise that needs be filtered out, focusing solely on facial information. refer this Rigid Cognitive Problem. The Problem can lead discrepancies...

10.1609/aaai.v39i6.32647 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Multimodal Large Models (MLMs) are becoming a significant research focus, combining powerful large language models with multimodal learning to perform complex tasks across different data modalities. This review explores the latest developments and challenges in MLMs, emphasizing their potential achieving artificial general intelligence as pathway world models. We provide an overview of key techniques such Chain Thought (M-COT), Instruction Tuning (M-IT), In-Context Learning (M-ICL)....

10.48550/arxiv.2407.00118 preprint EN arXiv (Cornell University) 2024-06-27

The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text the feature space, DFER poses challenges due abstract nature video, making label representation limited perfect difficult. To address this issue, we have designed A$^{3}$lign-DFER, which introduces a new labeling paradigm comprehensively alignment, thus...

10.48550/arxiv.2403.04294 preprint EN arXiv (Cornell University) 2024-03-07

Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of priori knowledge pertaining to intricate textures. Harnessing the potential leveraging this in context super-resolution presents compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook constraints imposed by degradation information on process. Furthermore, these methods fail consider spatial variability inherent estimated blur kernel, stemming from factors...

10.48550/arxiv.2403.05808 preprint EN arXiv (Cornell University) 2024-03-09

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing methods tend to consider as noise that needs be filtered out, focusing solely on facial information. refer this Rigid Cognitive Problem. The Problem can lead discrepancies...

10.48550/arxiv.2405.18769 preprint EN arXiv (Cornell University) 2024-05-29

The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings facial expressions from video content, underpinned training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume noise data. Noise arises low-quality captures that defy logical labeling, and instances suffer mislabeling due to annotation bias, engendering two principal types uncertainty: uncertainty regarding data...

10.48550/arxiv.2406.16473 preprint EN arXiv (Cornell University) 2024-06-24

The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) with unknown degradation modes. Most existing methods model the process using blur kernels. However, this explicit modeling approach struggles cover complex and varied processes encountered in real world, such as high-order combinations JPEG compression, blur, noise. Implicit for can effectively overcome issue, but a key challenge implicit is lack accurate ground truth labels...

10.48550/arxiv.2406.16459 preprint EN arXiv (Cornell University) 2024-06-24

Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, digital avatars by leveraging AI technologies. As the FER field evolves controlled laboratory environments more complex in-the-wild scenarios, advanced methods have been rapidly developed new challenges apporaches are encounted, not well addressed existing reviews of FER. This paper offers a...

10.48550/arxiv.2408.15777 preprint EN arXiv (Cornell University) 2024-08-28

In the field of affective computing, fully leveraging information from a variety sensory modalities is essential for comprehensive understanding and processing human emotions. Inspired by process through which brain handles emotions theory cross-modal plasticity, we propose UMBEnet, brain-like unified modal network. The primary design UMBEnet includes Dual-Stream (DS) structure that fuses inherent prompts with Prompt Pool Sparse Feature Fusion (SFF) module. aimed at integrating different...

10.48550/arxiv.2407.15590 preprint EN arXiv (Cornell University) 2024-07-22
Coming Soon ...