12-03 Vision Transformer 演化史: CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
10-21 Vision Transformer 演化史: CoAtNet: Marrying Convolution and Attention for All Data Sizes - 使用 Depthwise Conv 來結合 CNN 與 Transformer
10-07 Vision Transformer 演化史: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - 打破各項 SOTA 的新網路