09-08 Vision Transformer 演化史: Going deeper with Image Transformers - CaiT 引入 LayerScale 及 class-attention layers 優化 DeiT
09-02 Vision Transformer 演化史: Bottleneck Transformers for Visual Recognition - BoT 把 Bottleneck 加上 Transformer
08-17 Vision Transformer 演化史: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions - 把金字塔網路應用在 Transformer
08-13 Vision Transformer 演化史: Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet - T2T-ViT
08-06 Vision Transformer 演化史: Incorporating Convolution Designs into Visual Transformers - Convolution-enhanced image Transformer (CeiT) 又一篇 CNN 加 Transformer
08-06 Vision Transformer 演化史: CvT: Introducing Convolutions to Vision Transformers - CNN 與 Transformer 各取所長
07-28 Vision Transformer 演化史: Conditional Positional Encodings for Vision Transformers - 可變序列長短的 Positional Encoding