Keda Lu

ORCID: 0009-0006-8974-3813
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Human Pose and Action Recognition
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Speech Recognition and Synthesis
  • Remote-Sensing Image Classification
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Topic Modeling
  • Advanced Neural Network Applications
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Gait Recognition and Analysis
  • Emotion and Mood Recognition
  • Image and Signal Denoising Methods
  • Digital Media Forensic Detection
  • Mind wandering and attention
  • Advanced Vision and Imaging
  • Cognitive Science and Mapping
  • Human-Automation Interaction and Safety
  • Domain Adaptation and Few-Shot Learning
  • Video Surveillance and Tracking Methods

University of Science and Technology of China
2022-2024

Engagement estimation in human conversations has been one of the most important research issues for natural human-robot interaction. However, previous datasets and studies mainly focus on video-wise level engagement estimation, therefore, can hardly reflect human's constantly changing engagement. Fortunately, MultiMediate '23 challenge provides frame-wise task. In this paper, we propose Sliding Window Seq2seq Modeling by BiLSTM Transformer with powerful sequence modeling capabilities. Our...

10.1145/3581783.3612852 article EN 2023-10-26

Semi-supervised learning is a highly researched problem, but existing semi-supervised object detection frameworks are based on RGB images, and pre-trained models cannot be used for hyperspectral images. To overcome these difficulties, this paper first select fewer suitable data augmentation methods to improve the accuracy of supervised model labeled training set, which characteristics Next, in order make full use unlabeled we generate pseudo-labels with trained stage mix obtained set. Then,...

10.1109/cvprw56347.2022.00045 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

As a variant of visual question answering (VQA), text (VTQA) provides text-image pair for each question. Text utilizes named entities to describe corresponding image. Consequently, the ability perform multi-hop reasoning using between and image becomes critically important. However, existing models pay relatively less attention this aspect. Therefore, we propose Answer-Based Entity Extraction Alignment Model (AEEA) enable comprehensive understanding support reasoning. The core AEEA lies in...

10.1145/3581783.3612850 article EN 2023-10-26

This paper summarizes the top contributions to first semi-supervised hyperspectral object detection (SSHOD) challenge, which was organized as a part of Perception Beyond Visible Spectrum (PBVS) 2022 workshop at Computer Vision and Pattern Recognition (CVPR) conference. The SSHODC challenge is first-of-its-kind dataset with temporally contiguous frames collected from university rooftop observing 4-way vehicle intersection over period three days. contains total 2890 frames, captured an average...

10.1109/cvprw56347.2022.00054 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks become a standard modeling technique for sequential data in TTS and are widely used. However, training model which includes RNN components requires powerful GPU performance takes long time. In contrast, CNN-based sequence techniques can significantly reduce the parameters time of while guaranteeing certain due to their high parallelism, alleviate these...

10.48550/arxiv.2403.08164 preprint EN arXiv (Cornell University) 2024-03-12

Real-time engagement estimation has been an important research topic in human-computer interaction recent years. The emergence of the NOvice eXpert Interaction (NOXI) dataset, enriched with frame-wise annotations, catalyzed a surge efforts this domain. Existing feature sequence partitioning methods for ultra-long videos have encountered challenges including insufficient information utilization and repetitive inference. Moreover, those studies focus mainly on target participants’ features...

10.24963/ijcai.2024/353 article EN 2024-07-26

10.1109/cvprw63382.2024.00312 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17
Coming Soon ...