NFDI4DS | UHH-SEMS - Publication Details

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

OPENALEX - Publications

Xingjia Pan Yuqiang Ren Kekai Sheng Weiming Dong Haolei Yuan and 3 more

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.1109/cvpr42600.2020.01122 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

OPENALEX - Publications

Yifan Xu Zhijie Zhang Mengdan Zhang Kekai Sheng Ke Li and 4 more

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...

10.1609/aaai.v36i3.20202 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Attention-based Multi-Patch Aggregation for Image Aesthetic Assessment

OPENALEX - Publications

Kekai Sheng Weiming Dong Chongyang Ma Xing Mei Feiyue Huang and 1 more

Aggregation structures with explicit information, such as image attributes and scene semantics, are effective popular for intelligent systems assessing aesthetics of visual data. However, useful information may not be available due to the high cost manual annotation expert design. In this paper, we present a novel multi-patch (MP) aggregation method aesthetic assessment. Different from state-of-the-art methods, which augment an MP network various attributes, train model in end-to-end manner...

10.1145/3240508.3240554 article EN Proceedings of the 30th ACM International Conference on Multimedia 2018-10-15

Transformers in computational visual media: A survey

OPENALEX - Publications

Yifan Xu Huapeng Wei Minxuan Lin Yingying Deng Kekai Sheng and 5 more

Abstract Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than RNN sequential structure. Thus, such models can be trained in parallel represent global information. This study comprehensively surveys recent transformer works....

10.1007/s41095-021-0247-3 article EN cc-by Computational Visual Media 2021-10-27

Generalizable Representation Learning for Mixture Domain Face Anti-Spoofing

OPENALEX - Publications

Zhihong Chen Taiping Yao Kekai Sheng Shouhong Ding Ying Tai and 3 more

Face anti-spoofing approach based on domain generalization (DG) has drawn growing attention due to its robustness for unseen scenarios. Existing DG methods assume that the label is known. However, in real-world applications, collected dataset always contains mixture domains, where unknown. In this case, most of existing may not work. Further, even if we can obtain as methods, think just a sub-optimal partition. To overcome limitation, propose dynamic adjustment meta-learning (D$^2$AM)...

10.1609/aaai.v35i2.16199 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Dual Reweighting Domain Generalization for Face Presentation Attack Detection

OPENALEX - Publications

Shubao Liu Ke-Yue Zhang Taiping Yao Kekai Sheng Shouhong Ding and 4 more

Face anti-spoofing approaches based on domain generalization (DG) have drawn growing attention due to their robustness for unseen scenarios. Previous methods treat each sample from multiple domains indiscriminately during the training process, and endeavor extract a common feature space improve generalization. However, complex biased data distribution, directly treating them equally will corrupt ability. To settle issue, we propose novel Dual Reweighting Domain Generalization (DRDG)...

10.24963/ijcai.2021/120 article EN 2021-08-01

Training-free Transformer Architecture Search

OPENALEX - Publications

Qinqin Zhou Kekai Sheng Xiawu Zheng Ke Li Xing Sun and 3 more

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks. The progresses are highly relevant to the architecture design, then it is worthwhile propose Architecture Search (TAS) search for better ViTs automatically. However, current TAS methods time-consuming and existing zero-cost proxies CNN do not generalize well ViT space according our experimental observations. In this paper, first time, we investigate how conduct a training-free manner devise...

10.1109/cvpr52688.2022.01062 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Training-Free Transformer Architecture Search With Zero-Cost Proxy Guided Evolution

OPENALEX - Publications

Qinqin Zhou Kekai Sheng Xiawu Zheng Ke Li Yonghong Tian and 2 more

Transformers have shown remarkable performance, however, their architecture design is a time-consuming process that demands expertise and trial-and-error. Thus, it worthwhile to investigate efficient methods for automatically searching high-performance via Transformer Architecture Search (TAS). In order improve the search efficiency, training-free proxy based been widely adopted in Neural (NAS). Whereas, these proxies found be inadequate generalizing well spaces, as confirmed by several...

10.1109/tpami.2024.3378781 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-03-19

Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning

OPENALEX - Publications

Kekai Sheng Weiming Dong Menglei Chai Guohui Wang Peng Zhou and 4 more

Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely a large number of manual annotations including both labels and related image attributes. In this paper, we revisit the problem from self-supervised feature learning perspective. Our motivation is that suitable representation should be able to distinguish different expert-designed manipulations, which close...

10.1609/aaai.v34i04.6026 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

HAM: Hidden Anchor Mechanism for Scene Text Detection

OPENALEX - Publications

Jie-Bo Hou Xiaobin Zhu Chang Liu Kekai Sheng Long-Huang Wu and 2 more

Direct regression and anchor are the two mainly effective prevailing mechanisms in paradigm of scene text detection. However, use direct regression-based methods may be challenging during optimization without help anchors as references. Unfortunately, anchor-based always suffer from careful design anchors, degrading robustness to complex scenes. To address above-mentioned problems, we propose a novel hidden mechanism (HAM) especially for The predictions innovatively regarded layers, weighted...

10.1109/tip.2020.3008863 article EN IEEE Transactions on Image Processing 2020-01-01

Reciprocal normalization for domain adaptation

OPENALEX - Publications

Zhiyong Huang Kekai Sheng Ke Li Jian Liang Taiping Yao and 3 more

10.1016/j.patcog.2023.109533 article EN Pattern Recognition 2023-03-14

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

OPENALEX - Publications

Yifan Xu Zhijie Zhang Mengdan Zhang Kekai Sheng Ke Li and 4 more

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...

10.48550/arxiv.2108.01390 preprint EN public-domain arXiv (Cornell University) 2021-01-01

Learning to assess visual aesthetics of food images

OPENALEX - Publications

Kekai Sheng Weiming Dong Haibin Huang Menglei Chai Yong Zhang and 2 more

Abstract Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of images remains a challenging relatively unexplored task, largely due the lack image datasets practical knowledge. Thus, we present Gourmet Photography Dataset (GPD), first large-scale dataset photos. It contains 24,000 with corresponding binary labels, covering large variety foods scenes. We...

10.1007/s41095-020-0193-5 article EN cc-by Computational Visual Media 2020-11-28

Towards Corruption-Agnostic Robust Domain Adaptation

OPENALEX - Publications

Yifan Xu Kekai Sheng Weiming Dong Baoyuan Wu Changsheng Xu and 1 more

Great progress has been achieved in domain adaptation decades. Existing works are always based on an ideal assumption that testing target domains independent and identically distributed with training domains. However, due to unpredictable corruptions (e.g., noise blur) real data, such as web images real-world object detection, methods increasingly required be corruption robust We investigate a new task, corruption-agnostic (CRDA), accurate original data against unavailable-for-training This...

10.1145/3501800 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-03-04

Gourmet photography dataset for aesthetic assessment of food images

OPENALEX - Publications

Kekai Sheng Weiming Dong Haibin Huang Chongyang Ma Bao-Gang Hu

In this study, we present the Gourmet Photography Dataset (GPD), which is first large-scale dataset for aesthetic assessment of food photographs. We collect 12,000 images together with human-annotated labels (i.e., aesthetically positive or negative) to build dataset. evaluate performance several popular machine learning algorithms verify effectiveness and importance our GPD Experimental results show that deep convolutional neural networks trained on can achieve comparable human experts in...

10.1145/3283254.3283260 article EN 2018-11-30

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

OPENALEX - Publications

Peixian Chen Kekai Sheng Mengdan Zhang Yunhang Shen Ke Li and 1 more

Open-vocabulary object detection (OVD) aims to scale up vocabulary size detect objects of novel categories beyond the training vocabulary. Recent work resorts rich knowledge in pre-trained vision-language models. However, existing methods are ineffective proposal-level alignment. Meanwhile, models usually suffer from confidence bias toward base and perform worse on ones. To overcome challenges, we present MEDet, a effective OVD framework with proposal mining prediction equalization. First,...

10.48550/arxiv.2206.11134 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Centroid-aware local discriminative metric learning in speaker verification

OPENALEX - Publications

Kekai Sheng Weiming Dong Wei Li Joseph Razik Feiyue Huang and 1 more

10.1016/j.patcog.2017.07.007 article EN Pattern Recognition 2017-07-15

Effective Label Propagation for Discriminative Semi-Supervised Domain Adaptation

OPENALEX - Publications

Zhiyong Huang Kekai Sheng Weiming Dong Xing Mei Chongyang Ma and 3 more

Semi-supervised domain adaptation (SSDA) methods have demonstrated great potential in large-scale image classification tasks when massive labeled data are available the source but very few samples provided target domain. Existing solutions usually focus on feature alignment between two domains while paying little attention to discrimination capability of learned representations In this paper, we present a novel and effective method, namely Effective Label Propagation (ELP), tackle problem by...

10.48550/arxiv.2012.02621 preprint EN other-oa arXiv (Cornell University) 2020-01-01

On Evolving Attention Towards Domain Adaptation

OPENALEX - Publications

Kekai Sheng Ke Li Xiawu Zheng Jian Liang Weiming Dong and 3 more

Towards better unsupervised domain adaptation (UDA). Recently, researchers propose various domain-conditioned attention modules and make promising progresses. However, considering that the configuration of attention, i.e., type position module, affects performance significantly, it is more generalized to optimize automatically be specialized for arbitrary UDA scenario. For first time, this paper proposes EvoADA: a novel framework evolve given task without human intervention. In particular,...

10.48550/arxiv.2103.13561 preprint EN public-domain arXiv (Cornell University) 2021-01-01

Evaluating the Quality of Face Alignment without Ground Truth

OPENALEX - Publications

Kekai Sheng Weiming Dong Yan Kong Xing Mei Jilin Li and 3 more

The study of face alignment has been an area intense research in computer vision, with its achievements widely used graphics applications. performance various methods is often image-dependent or somewhat random because their own strategy. This aims to develop a method that can select input image good results from many produced by single multiple ones. task challenging different need be evaluated without any ground truth. addresses this problem designing feasible feature extraction scheme...

10.1111/cgf.12760 article EN Computer Graphics Forum 2015-10-01

SU-F-R-17: Advancing Glioblastoma Multiforme (GBM) Recurrence Detection with MRI Image Texture Feature Extraction and Machine Learning

OPENALEX - Publications

Victoria Yu Dan Ruan Duc V. Nguyen Tania Kaprealian R.T. Chin and 1 more

Purpose: To test the potential of early Glioblastoma Multiforme (GBM) recurrence detection utilizing image texture pattern analysis in serial MR images post primary treatment intervention. Methods: image-sets six time points prior to confirmed diagnosis a GBM patient were included this study, with each point containing T1 pre-contrast, post-contrast, T2-Flair, and T2-TSE images. Eight Gray-level co-occurrence matrix (GLCM) features including Contrast, Correlation, Dissimilarity, Energy,...

10.1118/1.4955789 article EN Medical Physics 2016-06-01

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

OPENALEX - Publications

Xingjia Pan Yuqiang Ren Kekai Sheng Weiming Dong Haolei Yuan and 3 more

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.48550/arxiv.2005.09973 preprint EN other-oa arXiv (Cornell University) 2020-01-01