10-21 Vision Transformer 演化史: CoAtNet: Marrying Convolution and Attention for All Data Sizes - 使用 Depthwise Conv 來結合 CNN 與 Transformer
10-07 Vision Transformer 演化史: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - 打破各項 SOTA 的新網路
09-08 Vision Transformer 演化史: Going deeper with Image Transformers - CaiT 引入 LayerScale 及 class-attention layers 優化 DeiT
09-02 Vision Transformer 演化史: Bottleneck Transformers for Visual Recognition - BoT 把 Bottleneck 加上 Transformer
08-17 Vision Transformer 演化史: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions - 把金字塔網路應用在 Transformer