NFDI4DS | UHH-SEMS - Publication Details

Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering

DOI: 10.48550/arxiv.2407.19394 Publication Date: 2024-07-28

Abstract Supplemental Material References Cited by

AUTHORS (4)

Tianxiao Zhang

Wenju Xu

Bo Luo

Guanghui Wang

ABSTRACT

The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, self-attention mechanism of ViT captures context from outset, overlooking inherent relationships between neighboring pixels in or videos. Transformers mainly focus on while ignoring fine-grained local details. Consequently, lacks inductive bias during image video dataset training. In contrast, convolutional neural networks (CNNs), with their reliance filters, possess an bias, making them more efficient quicker converge than less data. this paper, we present a lightweight Depth-Wise Convolution module as shortcut models, bypassing entire blocks ensure models both minimal overhead. Additionally, introduce two architecture variants, allowing modules be applied multiple for parameter savings, incorporating independent parallel different kernels enhance acquisition information. proposed approach significantly boosts classification, object detection instance segmentation large margin, especially small datasets, evaluated CIFAR-10, CIFAR-100, Tiny-ImageNet ImageNet COCO segmentation. source code can accessed at https://github.com/ZTX-100/Efficient_ViT_with_DW.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....