- Human Pose and Action Recognition
- Generative Adversarial Networks and Image Synthesis
- Advanced Image Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Speech Recognition and Synthesis
- Remote-Sensing Image Classification
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Topic Modeling
- Advanced Neural Network Applications
- Anomaly Detection Techniques and Applications
- Image Retrieval and Classification Techniques
- Gait Recognition and Analysis
- Emotion and Mood Recognition
- Image and Signal Denoising Methods
- Digital Media Forensic Detection
- Mind wandering and attention
- Advanced Vision and Imaging
- Cognitive Science and Mapping
- Human-Automation Interaction and Safety
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
University of Science and Technology of China
2022-2024
Engagement estimation in human conversations has been one of the most important research issues for natural human-robot interaction. However, previous datasets and studies mainly focus on video-wise level engagement estimation, therefore, can hardly reflect human's constantly changing engagement. Fortunately, MultiMediate '23 challenge provides frame-wise task. In this paper, we propose Sliding Window Seq2seq Modeling by BiLSTM Transformer with powerful sequence modeling capabilities. Our...
Semi-supervised learning is a highly researched problem, but existing semi-supervised object detection frameworks are based on RGB images, and pre-trained models cannot be used for hyperspectral images. To overcome these difficulties, this paper first select fewer suitable data augmentation methods to improve the accuracy of supervised model labeled training set, which characteristics Next, in order make full use unlabeled we generate pseudo-labels with trained stage mix obtained set. Then,...
As a variant of visual question answering (VQA), text (VTQA) provides text-image pair for each question. Text utilizes named entities to describe corresponding image. Consequently, the ability perform multi-hop reasoning using between and image becomes critically important. However, existing models pay relatively less attention this aspect. Therefore, we propose Answer-Based Entity Extraction Alignment Model (AEEA) enable comprehensive understanding support reasoning. The core AEEA lies in...
This paper summarizes the top contributions to first semi-supervised hyperspectral object detection (SSHOD) challenge, which was organized as a part of Perception Beyond Visible Spectrum (PBVS) 2022 workshop at Computer Vision and Pattern Recognition (CVPR) conference. The SSHODC challenge is first-of-its-kind dataset with temporally contiguous frames collected from university rooftop observing 4-way vehicle intersection over period three days. contains total 2890 frames, captured an average...
Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks become a standard modeling technique for sequential data in TTS and are widely used. However, training model which includes RNN components requires powerful GPU performance takes long time. In contrast, CNN-based sequence techniques can significantly reduce the parameters time of while guaranteeing certain due to their high parallelism, alleviate these...
Real-time engagement estimation has been an important research topic in human-computer interaction recent years. The emergence of the NOvice eXpert Interaction (NOXI) dataset, enriched with frame-wise annotations, catalyzed a surge efforts this domain. Existing feature sequence partitioning methods for ultra-long videos have encountered challenges including insufficient information utilization and repetitive inference. Moreover, those studies focus mainly on target participants’ features...