NFDI4DS | UHH-SEMS - Publication Details

Rethinking and Improving Relative Position Encoding for Vision Transformer

Hyperparameter Position (finance) Relative value

DOI: 10.48550/arxiv.2107.14222 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Kan Wu

Houwen Peng

Minghao Chen

Jianlong Fu

Hongyang Chao

ABSTRACT

Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. General efficacy has been proven in natural language processing. However, computer vision, its not well studied and even remains controversial, e.g., whether relative can work equally as absolute position? In order clarify this, we first review existing methods analyze their pros cons when applied vision transformers. We then propose new dedicated 2D images, called image RPE (iRPE). Our consider directional distance modeling the interactions between queries embeddings self-attention mechanism. The proposed iRPE are simple lightweight. They be easily plugged into blocks. Experiments demonstrate that solely due methods, DeiT DETR obtain up 1.5% (top-1 Acc) 1.3% (mAP) stable improvements over original versions on ImageNet COCO respectively, without tuning any extra hyperparameters such learning rate weight decay. ablation analysis also yield interesting findings, some which run counter previous understanding. Code models open-sourced at https://github.com/microsoft/Cream/tree/main/iRPE.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Rethinking and Improving Relative Position Encoding for Vision Transformer

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....