- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Adversarial Robustness in Machine Learning
- Anomaly Detection Techniques and Applications
- Image Retrieval and Classification Techniques
- Human Pose and Action Recognition
- Generative Adversarial Networks and Image Synthesis
- Video Analysis and Summarization
- Face recognition and analysis
- Medical Image Segmentation Techniques
- Natural Language Processing Techniques
- Advanced Image Processing Techniques
- Gait Recognition and Analysis
- AI in cancer detection
- Machine Learning and ELM
- Topic Modeling
- Advanced Data Compression Techniques
- Image Processing Techniques and Applications
- Radiomics and Machine Learning in Medical Imaging
- Machine Learning and Data Classification
- Image Enhancement Techniques
- Explainable Artificial Intelligence (XAI)
Tencent (China)
2015-2024
Shandong Academy of Sciences
2023
Qilu University of Technology
2023
Beihang University
2023
State Key Laboratory of Software Development Environment
2023
Tsinghua University
2023
City University of Hong Kong
2023
Shandong University
2019-2022
Xiamen University
2018-2021
Artificial Intelligence in Medicine (Canada)
2021
Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning. However, they are still criticized for lack contextual information and fine-grained details, which contrast merits traditional grid features. In this paper, we introduce a novel Dual-Level Collaborative Transformer (DLCT) network to realize complementary advantages two Concretely, DLCT, these first processed Dual-way Self Attenion (DWSA) mine their...
Binary code learning has been emerging topic in large-scale cross-modality retrieval recently. It aims to map features from multiple modalities into a common Hamming space, where the similarity can be approximated efficiently via distance. To this end, most existing works learn binary codes directly data instances modalities, which preserve both intra-and inter-modal similarities respectively. Few methods consider fusion among multi-modal instead, explicitly capture their heterogeneous...
Accelerating convolutional neural networks has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning been regarded as a promising solution, which is due to its advantage significant speedup and memory reduction of both network model intermediate feature maps. To this end, most tend prune filters layer-wise fixed manner, incapable dynamically recover previously removed filter, well jointly optimize pruned across layers. In paper,...
Channel pruning is among the predominant approaches to compress deep neural networks. To this end, most existing methods focus on selecting channels (filters) by importance/optimization or regularization based rule-of-thumb designs, which defects in sub-optimal pruning. In paper, we propose a new channel method artificial bee colony algorithm (ABC), dubbed as ABCPruner, aims efficiently find optimal pruned structure, i.e., number each layer, rather than "important" previous works did. solve...
Recent progress on visual question answering has explored the merits of grid features for vision language tasks. Meanwhile, transformer-based models have shown remarkable performance in various sequence prediction problems. However, spatial information loss caused by flattening operation, as well defect transformer model distinguishing words and non words, are still left unexplored. In this paper, we first propose Grid-Augmented (GA) module, which relative geometry between grids incorporated...
Visible-infrared person re-identification (Re-ID) aims to match the pedestrian images of same identity from different modalities. Existing works mainly focus on alleviating modality discrepancy by aligning distributions features However, nuanced but discriminative information, such as glasses, shoes, and length clothes, has not been fully explored, especially in infrared modality. Without discovering nuances, it is challenging pedestrians across modalities using alignment solely, which...
Transformer-based architectures have shown great success in image captioning, where object regions are encoded and then attended into the vectorial representations to guide caption decoding. However, such only contain region-level information without considering global reflecting entire image, which fails expand capability of complex multi-modal reasoning captioning. In this paper, we introduce a Global Enhanced Transformer (termed GET) enable extraction more comprehensive representation,...
Recently, image-to-image translation has made significant progress in achieving both multi-label (i.e., conditioned on different labels) and multi-style generation with diverse styles) tasks. However, due to the unexplored independence exclusiveness labels, existing endeavors are defeated by involving uncontrolled manipulations results. In this paper, we propose Hierarchical Style Disentanglement (HiSD) address issue. Specifically, organize labels into a hierarchical tree structure, which...
Compressing convolutional neural networks (CNNs) has received ever-increasing research focus. However, most existing CNN compression methods do not interpret their inherent structures to distinguish the implicit redundancy. In this paper, we investigate problem of from a novel interpretable perspective. The relationship between input feature maps and 2D kernels is revealed in theoretical framework, based on which kernel sparsity entropy (KSE) indicator proposed quantitate map importance...
Weakly supervised learning has attracted growing research attention due to the significant saving in annotation cost for tasks that require intra-image annotations, such as object detection and semantic segmentation. To this end, existing weakly segmentation approaches follow an iterative label mining model training pipeline. However, a self-enforcement pipeline makes both easy be trapped local minimums. In paper, we join with multi-task scheme first time, which uses their respective failure...
Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures current neural are computationally inefficient. In this paper, we propose a greedy segmenter balanced character embedding inputs to alleviate existing drawbacks. Our is truly end-to-end, capable performing segmentation much faster even more accurate than state-of-the-art on benchmark datasets.
Deep learning models have shown their vulnerabilities to universal adversarial perturbations (UAP), which are quasi-imperceptible. Compared the conventional supervised UAPs that suffer from knowledge of training data, data-independent unsupervised more applicable. Existing methods fail take advantage model uncertainty produce robust perturbations. In this paper, we propose a new perturbation method, termed as Prior Driven Uncertainty Approximation (PD-UA), generate UAP by fully exploiting at...
In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the shape of source and generate photo-realistic results. Unlike other existing works that only use recognition model to keep identity similarity, 3D shape-aware control with geometric supervision from 3DMM reconstruction method. Meanwhile, introduce Semantic Facial Fusion module optimize combination encoder decoder features make adaptive blending, makes results more photo-realistic....
Channel pruning has been long studied to compress convolutional neural networks (CNNs), which significantly reduces the overall computation. Prior works implement channel in an unexplainable manner, tends reduce final classification errors while failing consider internal influence of each channel. In this article, we conduct a white box. Through deep visualization feature maps activated by different channels, observe that channels have varying contribution categories image classification....
Although person re-identification has achieved an impressive improvement in recent years, the common occlusion case caused by different obstacles is still unsettled issue real application scenarios. Existing methods mainly address this employing body clues provided extra network to distinguish visible part. Nevertheless, inevitable domain gap between assistant model and ReID datasets highly increased difficulty obtain effective efficient model. To escape from pre-trained networks achieve...
Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost. However, compressing remains as an open problem due to internal complexity of layer designs, i.e., Multi-Head Attention (MHA) Feed-Forward Network (FFN). To address this issue, we introduce Group-wise Transformation towards a universal yet lightweight vision-and-language tasks, termed LW-Transformer1. LW-Transformer applies reduce both computations Transformer, while also preserving...
While generic person re-identification has made remarkable improvement in recent years, these methods are designed under the assumption that entire body of is available. This brings about a significant performance degradation when suffering from occlusion caused by various obstacles real-world applications. To address this issue, data-driven strategies have emerged to enhance model's robustness occlusion. Following random erasing paradigm, typically employ randomly generated noise supersede...
Most image captioning models focus on one-line (single image) captioning, where the correlations like relevance and diversity among group images (e.g., within same album or event) are simply neglected, resulting in less accurate diverse captions. Recent works mainly consider imposing during online inference only, which neglect correlation visual structures offline training. In this paper, we propose a novel group-based scheme (termed GroupCap), jointly structured towards an optimal...
Recent advances on fine-grained image retrieval prefer learning convolutional neural network (CNN) with specific fullyconnect layer designed loss function for discriminative feature representation. Essentially, such should establish a robust metric to efficiently distinguish high-dimensional features within and outside categories. To this end, the existing functions are defected in two aspects: (a) The relationship is encoded inside training batch. Such local scope leads low accuracy. (b)...
Matrix factorization has been recently utilized for the task of multi-modal hashing cross-modality visual search, where basis functions are learned to map data from different modalities same Hamming embedding. In this paper, we propose a novel algorithm termed Supervised Factorization Hashing (SMFH) which tackles problem with collective non-matrix across modalities. particular, SMFH employs well-designed binary code learning preserve similarities among original features through graph...
In recent years, binary code learning, a.k.a. hashing, has received extensive attention in large-scale multimedia retrieval. It aims to encode high-dimensional data points into codes, hence the original metric space can be efficiently approximated via Hamming space. However, most existing hashing methods adopted offline batch which is not suitable handle incremental datasets with streaming or new instances. contrast, robustness of online remains as an open problem, while embedding...
Binary Neural Network (BNN) shows its predominance in reducing the complexity of deep neural networks. However, it suffers severe performance degradation. One major impediments is large quantization error between full-precision weight vector and binary vector. Previous works focus on compensating for norm gap while leaving angular bias hardly touched. In this paper, first time, we explore influence then introduce a Rotated (RBNN), which considers angle alignment binarized version. At...
Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the progress, two issues remain open. On one hand, deep features are coarsely extracted at image level rather than precisely level, which interrupted by background clutters. other training CNN with a standard triplet loss is time consuming and incapable to learn discriminative In this paper, we present novel...
Popular network pruning algorithms reduce redundant information by optimizing hand-crafted models, and may cause suboptimal performance long time in selecting filters. We innovatively introduce adaptive exemplar filters to simplify the algorithm design, resulting an automatic efficient approach called EPruner. Inspired face recognition community, we use a message-passing Affinity Propagation on weight matrices obtain number of exemplars, which then act as preserved EPruner breaks dependence...
Existing online knowledge distillation approaches either adopt the student with best performance or construct an ensemble model for better holistic performance. However, former strategy ignores other students' information, while latter increases computational complexity during deployment. In this article, we propose a novel method distillation, termed feature fusion and self-distillation (FFSD), which comprises two key components: FFSD, toward solving above problems in unified framework....