# Patches are all you need?
- Main claim: patches are what lead to an improved performace at least to a certain extent
- Stem [implementation](https://github.com/rwightman/pytorch-image-models/blob/7c67d6aca992f039eece0af5f7c29a43d48c00e4/timm/models/convmixer.py#L40-L44)
```python
self.stem = nn.Sequential(
nn.Conv2d(in_chans, dim, kernel_size=patch_size, stride=patch_size),
activation(),
nn.BatchNorm2d(dim)
)
```
- Blocks [implementation](https://github.com/rwightman/pytorch-image-models/blob/7c67d6aca992f039eece0af5f7c29a43d48c00e4/timm/models/convmixer.py#L45-L56)
```python
self.blocks = nn.Sequential(
*[nn.Sequential(
Residual(nn.Sequential(
nn.Conv2d(dim, dim, kernel_size, groups=dim, padding="same"),
activation(),
nn.BatchNorm2d(dim)
)),
nn.Conv2d(dim, dim, kernel_size=1),
activation(),
nn.BatchNorm2d(dim)
) for i in range(depth)]
)
```
- Note how `nn.Conv2d(dim, dim, kernel_size, groups=dim, padding="same")` mixes spatial dimensions only while `nn.Conv2d(dim, dim, kernel_size=1)` mixes channels only. Same in spirit as MLP-Mixer
###### tags: `vit`