08-13 Vision Transformer 演化史: Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet - T2T-ViT
08-06 Vision Transformer 演化史: Incorporating Convolution Designs into Visual Transformers - Convolution-enhanced image Transformer (CeiT) 又一篇 CNN 加 Transformer
08-06 Vision Transformer 演化史: CvT: Introducing Convolutions to Vision Transformers - CNN 與 Transformer 各取所長
07-28 Vision Transformer 演化史: Conditional Positional Encodings for Vision Transformers - 可變序列長短的 Positional Encoding
07-27 Vision Transformer 演化史: Visual Transformers: Transformer in Transformer - 使用雙層 Transformer 來重新思考 Patch Embedding
07-26 Vision Transformer 演化史: Visual Transformers: Token-based Image Representation and Processing for Computer Vision - 使用 visual token 來強化傳統 CNN 的結果
07-24 Vision Transformer 演化史: Training data-efficient image transformers & distillation through attention - DeiT 使用知識蒸餾來改進 ViT 要使用大訓練集的缺點