- Visual Attention and Saliency Detection
- Domain Adaptation and Few-Shot Learning
- Face recognition and analysis
- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Biometric Identification and Security
- Cancer-related molecular mechanisms research
- Digital Media Forensic Detection
- Olfactory and Sensory Function Studies
- Image Processing Techniques and Applications
- Advanced Memory and Neural Computing
- Industrial Vision Systems and Defect Detection
- Biochemical Analysis and Sensing Techniques
- Neural Networks and Reservoir Computing
- Advanced Image Processing Techniques
- CCD and CMOS Imaging Sensors
- Video Analysis and Summarization
- Handwritten Text Recognition Techniques
- Speech Recognition and Synthesis
- Advanced Fluorescence Microscopy Techniques
- Vehicle License Plate Recognition
- Music and Audio Processing
- Machine Learning and Data Classification
- Evolutionary Algorithms and Applications
Shandong Institute of Automation
2024
Tencent (China)
2019-2023
Chinese Academy of Sciences
2015-2019
Institute of Automation
2017-2018
University of Chinese Academy of Sciences
2017-2018
Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...
Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...
Aggregation structures with explicit information, such as image attributes and scene semantics, are effective popular for intelligent systems assessing aesthetics of visual data. However, useful information may not be available due to the high cost manual annotation expert design. In this paper, we present a novel multi-patch (MP) aggregation method aesthetic assessment. Different from state-of-the-art methods, which augment an MP network various attributes, train model in end-to-end manner...
Abstract Transformers, the dominant architecture for natural language processing, have also recently attracted much attention from computational visual media researchers due to their capacity long-range representation and high performance. Transformers are sequence-to-sequence models, which use a self-attention mechanism rather than RNN sequential structure. Thus, such models can be trained in parallel represent global information. This study comprehensively surveys recent transformer works....
Face anti-spoofing approach based on domain generalization (DG) has drawn growing attention due to its robustness for unseen scenarios. Existing DG methods assume that the label is known. However, in real-world applications, collected dataset always contains mixture domains, where unknown. In this case, most of existing may not work. Further, even if we can obtain as methods, think just a sub-optimal partition. To overcome limitation, propose dynamic adjustment meta-learning (D$^2$AM)...
Face anti-spoofing approaches based on domain generalization (DG) have drawn growing attention due to their robustness for unseen scenarios. Previous methods treat each sample from multiple domains indiscriminately during the training process, and endeavor extract a common feature space improve generalization. However, complex biased data distribution, directly treating them equally will corrupt ability. To settle issue, we propose novel Dual Reweighting Domain Generalization (DRDG)...
Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks. The progresses are highly relevant to the architecture design, then it is worthwhile propose Architecture Search (TAS) search for better ViTs automatically. However, current TAS methods time-consuming and existing zero-cost proxies CNN do not generalize well ViT space according our experimental observations. In this paper, first time, we investigate how conduct a training-free manner devise...
Transformers have shown remarkable performance, however, their architecture design is a time-consuming process that demands expertise and trial-and-error. Thus, it worthwhile to investigate efficient methods for automatically searching high-performance via Transformer Architecture Search (TAS). In order improve the search efficiency, training-free proxy based been widely adopted in Neural (NAS). Whereas, these proxies found be inadequate generalizing well spaces, as confirmed by several...
Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely a large number of manual annotations including both labels and related image attributes. In this paper, we revisit the problem from self-supervised feature learning perspective. Our motivation is that suitable representation should be able to distinguish different expert-designed manipulations, which close...
Direct regression and anchor are the two mainly effective prevailing mechanisms in paradigm of scene text detection. However, use direct regression-based methods may be challenging during optimization without help anchors as references. Unfortunately, anchor-based always suffer from careful design anchors, degrading robustness to complex scenes. To address above-mentioned problems, we propose a novel hidden mechanism (HAM) especially for The predictions innovatively regarded layers, weighted...
Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue. Since computation complexity of ViT quadratic with respect to input sequence length, mainstream paradigm for reduction reduce number tokens. Existing designs include structured spatial compression that uses progressive shrinking pyramid computations large feature maps, and unstructured token pruning dynamically drops redundant However, limitation existing lies in...
Abstract Distinguishing aesthetically pleasing food photos from others is an important visual analysis task for social media and ranking systems related to food. Nevertheless, aesthetic assessment of images remains a challenging relatively unexplored task, largely due the lack image datasets practical knowledge. Thus, we present Gourmet Photography Dataset (GPD), first large-scale dataset photos. It contains 24,000 with corresponding binary labels, covering large variety foods scenes. We...
Great progress has been achieved in domain adaptation decades. Existing works are always based on an ideal assumption that testing target domains independent and identically distributed with training domains. However, due to unpredictable corruptions (e.g., noise blur) real data, such as web images real-world object detection, methods increasingly required be corruption robust We investigate a new task, corruption-agnostic (CRDA), accurate original data against unavailable-for-training This...
In this study, we present the Gourmet Photography Dataset (GPD), which is first large-scale dataset for aesthetic assessment of food photographs. We collect 12,000 images together with human-annotated labels (i.e., aesthetically positive or negative) to build dataset. evaluate performance several popular machine learning algorithms verify effectiveness and importance our GPD Experimental results show that deep convolutional neural networks trained on can achieve comparable human experts in...
Open-vocabulary object detection (OVD) aims to scale up vocabulary size detect objects of novel categories beyond the training vocabulary. Recent work resorts rich knowledge in pre-trained vision-language models. However, existing methods are ineffective proposal-level alignment. Meanwhile, models usually suffer from confidence bias toward base and perform worse on ones. To overcome challenges, we present MEDet, a effective OVD framework with proposal mining prediction equalization. First,...
Semi-supervised domain adaptation (SSDA) methods have demonstrated great potential in large-scale image classification tasks when massive labeled data are available the source but very few samples provided target domain. Existing solutions usually focus on feature alignment between two domains while paying little attention to discrimination capability of learned representations In this paper, we present a novel and effective method, namely Effective Label Propagation (ELP), tackle problem by...
Towards better unsupervised domain adaptation (UDA). Recently, researchers propose various domain-conditioned attention modules and make promising progresses. However, considering that the configuration of attention, i.e., type position module, affects performance significantly, it is more generalized to optimize automatically be specialized for arbitrary UDA scenario. For first time, this paper proposes EvoADA: a novel framework evolve given task without human intervention. In particular,...
The study of face alignment has been an area intense research in computer vision, with its achievements widely used graphics applications. performance various methods is often image-dependent or somewhat random because their own strategy. This aims to develop a method that can select input image good results from many produced by single multiple ones. task challenging different need be evaluated without any ground truth. addresses this problem designing feasible feature extraction scheme...
Purpose: To test the potential of early Glioblastoma Multiforme (GBM) recurrence detection utilizing image texture pattern analysis in serial MR images post primary treatment intervention. Methods: image-sets six time points prior to confirmed diagnosis a GBM patient were included this study, with each point containing T1 pre-contrast, post-contrast, T2-Flair, and T2-TSE images. Eight Gray-level co-occurrence matrix (GLCM) features including Contrast, Correlation, Dissimilarity, Energy,...
Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...