import torch from torch_scatter import scatter We will use this graph: num_nodes = 5 num_edges = 6 num_edge_types = 3 edge_index = torch.LongTensor([
Dec 20, 2022Paper: https://arxiv.org/abs/2111.11418 Key idea: abstract the network architecture from high performing models like Transformers, MLP-Mixers etc. It is this network that gives good performance. They replace transformer, MLP-mixer etc with pooling to prove this statement. The main thing to understand is how pooling works: class Pooling(nn.Module): """ Implementation of pooling for PoolFormer --pool_size: pooling size """
Dec 9, 2022Main claim: patches are what lead to an improved performace at least to a certain extent Stem implementation self.stem = nn.Sequential( nn.Conv2d(in_chans, dim, kernel_size=patch_size, stride=patch_size), activation(), nn.BatchNorm2d(dim) ) Blocks implementation
Dec 9, 2022Mostly the same people behind ViT paper. Adequate (84.15 top 1 on ImageNet by Mixer-L/16) but not SOTA. Benefits much more from scaling up. Common part with ViT: Divide an image into NxN patches, unroll each patch and do a linear transform. Some simple Mlp = Linear -> Activation -> Dropout -> Linear -> Dropout style MLP layers implemented here class Mlp(nn.Module): """ MLP as used in Vision Transformer, MLP-Mixer and related networks """ def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): super().__init__()
Dec 9, 2022or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up