Qibin Hou

ORCID: 0000-0002-8388-8708
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Visual Attention and Saliency Detection
  • Multimodal Machine Learning Applications
  • Anomaly Detection Techniques and Applications
  • Remote-Sensing Image Classification
  • Advanced Image Processing Techniques
  • COVID-19 diagnosis using AI
  • Image Enhancement Techniques
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • Adversarial Robustness in Machine Learning
  • Medical Image Segmentation Techniques
  • Machine Learning and Data Classification
  • Natural Language Processing Techniques
  • Computer Graphics and Visualization Techniques
  • Topic Modeling
  • Face Recognition and Perception
  • Human Pose and Action Recognition
  • Image Processing and 3D Reconstruction
  • Brain Tumor Detection and Classification
  • Video Coding and Compression Technologies
  • 3D Shape Modeling and Analysis
  • Advanced Memory and Neural Computing

Nankai University
2016-2025

Xinjiang University
2023

National University of Singapore
2020-2021

Fujian Institute of Research on the Structure of Matter
2016

Chinese Academy of Sciences
2003-2016

Shenzhen University
2015

Institut de Recherche Interdisciplinaire en Sciences Sociales
2014

Center for Research and Interdisciplinarity
2014

Geomin (Czechia)
2007

Beihang University
2006

Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect positional information, which is important generating spatially selective maps. In this paper, we propose a novel mechanism networks by embedding information into attention, call "coordinate attention". Unlike that transforms feature tensor to single vector via 2D global pooling,...

10.1109/cvpr46437.2021.01350 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and algorithms developed lately have been based Fully (FCNs). There still a large room for improvement over generic FCN models that do not explicitly deal with scale-space problem. Holisitcally-Nested Edge Detector (HED) provides skip-layer structure deep supervision edge boundary detection, but performance gain HED obvious. In...

10.1109/cvpr.2017.563 article EN 2017-07-01

We solve the problem of salient object detection by investigating how to expand role pooling in convolutional neural networks. Based on U-shape architecture, we first build a global guidance module (GGM) upon bottom-up pathway, aiming at providing layers different feature levels location information potential objects. further design aggregation (FAM) make coarse-level semantic well fused with fine-level features from top-down path- way. By adding FAMs after fusion operations GGM can be...

10.1109/cvpr.2019.00404 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Detecting and segmenting salient objects from natural scenes, often referred to as object detection, has attracted great interest in computer vision. While many models have been proposed several applications emerged, a deep understanding of achievements issues remains lacking. We aim provide comprehensive review recent progress detection situate this field among other closely related areas such generic scene segmentation, proposal generation, saliency for fixation prediction. Covering 228...

10.1007/s41095-019-0149-9 article EN cc-by Computational Visual Media 2019-06-01

Benefiting from the capability of building interdependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety computer vision tasks recently. In this paper, we investigate light-weight but effective present triplet attention, novel method for computing weights by capturing crossdimension interaction using three-branch structure. For an input tensor, builds inter-dimensional dependencies rotation operation followed residual...

10.1109/wacv48630.2021.00318 article EN 2021-01-01

Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and algorithms developed lately have been based Fully (FCNs). There still a large room for improvement over generic FCN models that do not explicitly deal with scale-space problem. Holistically-Nested Edge Detector (HED) provides skip-layer structure deep supervision edge boundary detection, but performance gain HED salience...

10.1109/tpami.2018.2815688 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-03-14

Spatial pooling has been proven highly effective to capture long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial that usually a regular shape of NxN, we rethink the formulation by introducing new strategy, called strip pooling, which considers long but narrow kernel, i.e., 1xN or Nx1. Based on further investigate architecture design 1) module enables backbone networks efficiently model dependencies; 2) presenting...

10.1109/cvpr42600.2020.00406 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The class activation maps are generated from the final convolutional layer of CNN. They can highlight discriminative object regions for interest. These discovered have been widely used weakly-supervised tasks. However, due to small spatial resolution layer, such often locate coarse target objects, limiting performance tasks that need pixel-accurate locations. Thus, we aim generate more fine-grained localization information objects accurately. In this paper, by rethinking relationships...

10.1109/tip.2021.3089943 article EN IEEE Transactions on Image Processing 2021-01-01

Recent advances on CNNs are mostly devoted to designing more complex architectures enhance their representation learning capacity. In this paper, we consider how improve the basic convolutional feature transformation process of without tuning model architectures. To end, present a novel self-calibrated convolutions that explicitly expand fields-of-view each layers through internal communications and hence enrich output features. particular, unlike standard fuse spatial channel-wise...

10.1109/cvpr42600.2020.01011 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of segmentation due to efficiency self-attention in encoding spatial information. In this paper, we show that attention is more efficient and effective way encode contextual information than mechanism transformers. By re-examining characteristics owned by successful models, discover several key components leading performance improvement models....

10.48550/arxiv.2209.08575 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Vision transformers (ViTs) have been successfully applied in image classification tasks recently. In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to deeper. More specifically, empirically observe that such scaling difficulty is caused attention collapse issue: as transformer goes deeper, maps gradually become similar and even much same after certain layers....

10.48550/arxiv.2103.11886 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Recent research on remote sensing object detection has largely focused improving the representation of oriented bounding boxes but overlooked unique prior knowledge presented in scenarios. Such can be useful because tiny objects may mistakenly detected without referencing a sufficiently long-range context, which vary for different objects. This paper considers these priors and proposes lightweight Large Selective Kernel Network (LSKNet). LSKNet dynamically adjust its large spatial receptive...

10.1109/iccv51070.2023.01540 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Object attention maps generated by image classifiers are usually used as priors for weakly-supervised segmentation approaches. However, normal produce only at the most discriminative object parts, which limits performance of task. Therefore, how to effectively identify entire regions in a manner has always been challenging and meaningful problem. We observe that produced classification network continuously focus on different parts during training. In order accumulate discovered we propose an...

10.1109/iccv.2019.00216 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With low efficiency encoding fine-level features, the performance of ViTs is still inferior to state-of-the-art CNNs when trained from scratch on a midsize dataset like ImageNet. Through experimental analysis, we find it because two reasons: 1) simple tokenization input images fails model important local structure such as edges and lines, leading training sample efficiency; 2) redundant attention backbone...

10.1109/tpami.2022.3206108 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-01

Label smoothing is an effective regularization tool for deep neural networks (DNNs), which generates soft labels by applying a weighted average between the uniform distribution and hard label. It often used to reduce overfitting problem of training DNNs further improve classification performance. In this paper, we aim investigate how generate more reliable labels. We present Online Smoothing (OLS) strategy, based on statistics model prediction target category. The proposed OLS constructs...

10.1109/tip.2021.3089942 article EN IEEE Transactions on Image Processing 2021-01-01

In this paper, we present Vision Permutator, a conceptually simple and data efficient MLP-like architecture for visual recognition. By realizing the importance of positional information carried by 2D feature representations, unlike recent models that encode spatial along flattened dimensions, Permutator separately encodes representations height width dimensions with linear projections. This allows to capture long-range dependencies meanwhile avoid attention building process in transformers....

10.1109/tpami.2022.3145427 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-25

We explore the potential of pooling techniques on task salient object detection by expanding its role in convolutional neural networks. In general, two pooling-based modules are proposed. A global guidance module (GGM) is first built based bottom-up pathway U-shape architecture, which aims to guide location information objects into layers at different feature levels. aggregation (FAM) further designed seamlessly fuse coarse-level semantic with fine-level features top-down pathway. can...

10.1109/tpami.2021.3140168 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-04

Mining precise class-aware attention maps, a.k.a, class activation is essential for weakly supervised semantic segmentation. In this paper, we present L2G, a simple online local-to-global knowledge transfer framework high-quality object mining. We observe that classification models can discover regions with more details when replacing the input image its local patches. Taking into account, first leverage network to extract attentions from multiple patches randomly cropped image. Then,...

10.1109/cvpr52688.2022.01638 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e.g., SwinIR) can significantly improve model performance but computation overhead is also considerable. In this paper, we present SRFormer, a simple novel method enjoy benefit of large self-attention introduces even less computational burden. The core our SRFormer permuted (PSA), which strikes an appropriate balance between channel and spatial information self-attention. Our PSA be...

10.1109/iccv51070.2023.01174 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Knowledge distillation (KD) has witnessed its powerful capability in learning compact models object detection. Previous KD methods for detection mostly focus on imitating deep features within the imitation regions instead of mimicking classification logit due to inefficiency distilling localization information and trivial improvement. In this paper, by reformulating knowledge process localization, we present a novel (LD) method which can efficiently transfer from teacher student. Moreover,...

10.1109/cvpr52688.2022.00919 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger learner. Towards this goal, we propose Contrastive Autoencoders (CMAE), new self-supervised pre-training method learning more comprehensive and capable representations. By elaboratively unifying contrastive (CL) masked model through novel designs, CMAE leverages their respective...

10.1109/tpami.2023.3336525 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-11-28

Vision Transformers have been the most popular network architecture in visual recognition recently due to strong ability of encode global information. However, its high computational cost when processing high-resolution images limits applications downstream tasks. In this paper, we take a deep look at internal structure self-attention and present simple Transformer style convolutional neural (ConvNet) for recognition. By comparing design principles recent ConvNets Transformers, propose...

10.1109/tpami.2024.3401450 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-05-15

10.1109/cvpr52733.2024.01563 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16
Coming Soon ...