Kekai Sheng

ORCID: 0000-0002-5382-3241
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Visual Attention and Saliency Detection
  • Domain Adaptation and Few-Shot Learning
  • Face recognition and analysis
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Biometric Identification and Security
  • Cancer-related molecular mechanisms research
  • Digital Media Forensic Detection
  • Olfactory and Sensory Function Studies
  • Image Processing Techniques and Applications
  • Advanced Memory and Neural Computing
  • Industrial Vision Systems and Defect Detection
  • Biochemical Analysis and Sensing Techniques
  • Neural Networks and Reservoir Computing
  • Advanced Image Processing Techniques
  • CCD and CMOS Imaging Sensors
  • Video Analysis and Summarization
  • Handwritten Text Recognition Techniques
  • Speech Recognition and Synthesis
  • Advanced Fluorescence Microscopy Techniques
  • Vehicle License Plate Recognition
  • Music and Audio Processing
  • Machine Learning and Data Classification
  • Evolutionary Algorithms and Applications

Shandong Institute of Automation
2024

Tencent (China)
2019-2023

Chinese Academy of Sciences
2015-2019

Institute of Automation
2017-2018

University of Chinese Academy of Sciences
2017-2018

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.1109/cvpr42600.2020.01122 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...

10.1609/aaai.v36i3.20202 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Aggregation structures with explicit information, such as image attributes and scene semantics, are effective popular for intelligent systems assessing aesthetics of visual data. However, useful information may not be available due to the high cost manual annotation expert design. In this paper, we present a novel multi-patch (MP) aggregation method aesthetic assessment. Different from state-of-the-art methods, which augment an MP network various attributes, train model in end-to-end manner...

10.1145/3240508.3240554 article EN Proceedings of the 30th ACM International Conference on Multimedia 2018-10-15

Abstract Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than RNN sequential structure. Thus, such models can be trained in parallel represent global information. This study comprehensively surveys recent transformer works....

10.1007/s41095-021-0247-3 article EN cc-by Computational Visual Media 2021-10-27

Face anti-spoofing approach based on domain generalization (DG) has drawn growing attention due to its robustness for unseen scenarios. Existing DG methods assume that the label is known. However, in real-world applications, collected dataset always contains mixture domains, where unknown. In this case, most of existing may not work. Further, even if we can obtain as methods, think just a sub-optimal partition. To overcome limitation, propose dynamic adjustment meta-learning (D$^2$AM)...

10.1609/aaai.v35i2.16199 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Face anti-spoofing approaches based on domain generalization (DG) have drawn growing attention due to their robustness for unseen scenarios. Previous methods treat each sample from multiple domains indiscriminately during the training process, and endeavor extract a common feature space improve generalization. However, complex biased data distribution, directly treating them equally will corrupt ability. To settle issue, we propose novel Dual Reweighting Domain Generalization (DRDG)...

10.24963/ijcai.2021/120 article EN 2021-08-01

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks. The progresses are highly relevant to the architecture design, then it is worthwhile propose Architecture Search (TAS) search for better ViTs automatically. However, current TAS methods time-consuming and existing zero-cost proxies CNN do not generalize well ViT space according our experimental observations. In this paper, first time, we investigate how conduct a training-free manner devise...

10.1109/cvpr52688.2022.01062 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Transformers have shown remarkable performance, however, their architecture design is a time-consuming process that demands expertise and trial-and-error. Thus, it worthwhile to investigate efficient methods for automatically searching high-performance via Transformer Architecture Search (TAS). In order improve the search efficiency, training-free proxy based been widely adopted in Neural (NAS). Whereas, these proxies found be inadequate generalizing well spaces, as confirmed by several...

10.1109/tpami.2024.3378781 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-03-19

Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely a large number of manual annotations including both labels and related image attributes. In this paper, we revisit the problem from self-supervised feature learning perspective. Our motivation is that suitable representation should be able to distinguish different expert-designed manipulations, which close...

10.1609/aaai.v34i04.6026 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Direct regression and anchor are the two mainly effective prevailing mechanisms in paradigm of scene text detection. However, use direct regression-based methods may be challenging during optimization without help anchors as references. Unfortunately, anchor-based always suffer from careful design anchors, degrading robustness to complex scenes. To address above-mentioned problems, we propose a novel hidden mechanism (HAM) especially for The predictions innovatively regarded layers, weighted...

10.1109/tip.2020.3008863 article EN IEEE Transactions on Image Processing 2020-01-01

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...

10.48550/arxiv.2108.01390 preprint EN public-domain arXiv (Cornell University) 2021-01-01

Abstract Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of images remains a challenging relatively unexplored task, largely due the lack image datasets practical knowledge. Thus, we present Gourmet Photography Dataset (GPD), first large-scale dataset photos. It contains 24,000 with corresponding binary labels, covering large variety foods scenes. We...

10.1007/s41095-020-0193-5 article EN cc-by Computational Visual Media 2020-11-28

Great progress has been achieved in domain adaptation decades. Existing works are always based on an ideal assumption that testing target domains independent and identically distributed with training domains. However, due to unpredictable corruptions (e.g., noise blur) real data, such as web images real-world object detection, methods increasingly required be corruption robust We investigate a new task, corruption-agnostic (CRDA), accurate original data against unavailable-for-training This...

10.1145/3501800 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-03-04

In this study, we present the Gourmet Photography Dataset (GPD), which is first large-scale dataset for aesthetic assessment of food photographs. We collect 12,000 images together with human-annotated labels (i.e., aesthetically positive or negative) to build dataset. evaluate performance several popular machine learning algorithms verify effectiveness and importance our GPD Experimental results show that deep convolutional neural networks trained on can achieve comparable human experts in...

10.1145/3283254.3283260 article EN 2018-11-30

Open-vocabulary object detection (OVD) aims to scale up vocabulary size detect objects of novel categories beyond the training vocabulary. Recent work resorts rich knowledge in pre-trained vision-language models. However, existing methods are ineffective proposal-level alignment. Meanwhile, models usually suffer from confidence bias toward base and perform worse on ones. To overcome challenges, we present MEDet, a effective OVD framework with proposal mining prediction equalization. First,...

10.48550/arxiv.2206.11134 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Semi-supervised domain adaptation (SSDA) methods have demonstrated great potential in large-scale image classification tasks when massive labeled data are available the source but very few samples provided target domain. Existing solutions usually focus on feature alignment between two domains while paying little attention to discrimination capability of learned representations In this paper, we present a novel and effective method, namely Effective Label Propagation (ELP), tackle problem by...

10.48550/arxiv.2012.02621 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Towards better unsupervised domain adaptation (UDA). Recently, researchers propose various domain-conditioned attention modules and make promising progresses. However, considering that the configuration of attention, i.e., type position module, affects performance significantly, it is more generalized to optimize automatically be specialized for arbitrary UDA scenario. For first time, this paper proposes EvoADA: a novel framework evolve given task without human intervention. In particular,...

10.48550/arxiv.2103.13561 preprint EN public-domain arXiv (Cornell University) 2021-01-01

The study of face alignment has been an area intense research in computer vision, with its achievements widely used graphics applications. performance various methods is often image-dependent or somewhat random because their own strategy. This aims to develop a method that can select input image good results from many produced by single multiple ones. task challenging different need be evaluated without any ground truth. addresses this problem designing feasible feature extraction scheme...

10.1111/cgf.12760 article EN Computer Graphics Forum 2015-10-01

Purpose: To test the potential of early Glioblastoma Multiforme (GBM) recurrence detection utilizing image texture pattern analysis in serial MR images post primary treatment intervention. Methods: image-sets six time points prior to confirmed diagnosis a GBM patient were included this study, with each point containing T1 pre-contrast, post-contrast, T2-Flair, and T2-TSE images. Eight Gray-level co-occurrence matrix (GLCM) features including Contrast, Correlation, Dissimilarity, Energy,...

10.1118/1.4955789 article EN Medical Physics 2016-06-01

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.48550/arxiv.2005.09973 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...