## Geometric data augmentations for computer vision and internals
TechShare, 13 april 2022
by Victor
<!-- Put the link to this slide here so people can follow -->
<!-- slide: https://hackmd.io/@xnXPVlYVRZC457vsNOVEFQ/techshare-geom-dataaugs -->
---
## Content
- Data augmentations and Deep Learning
- Geometric data augmentations:
- Affine, resize transformations
- New Transforms API in TorchVision
- `torch.nn.functional.interpolate`:
- Porting implementation using `TensorIterator` and benchmarks
- Anti-aliasing feature
- Bugs influencing Deep Learning
---
### Data augmentations and Deep Learning
_Here, data augmentation == image augmentation_
Random transformations: geometric, color, GAN ...
:+1:
- Seems like more training data
- Better model's generalization
Limitations:
- Data augs depend on domain
---
### Data augmentations and Deep Learning
- https://github.com/aleju/imgaug
<img width=500 src="https://i.imgur.com/ueaMdZF.png"/>
---
### Data augmentations and Deep Learning
Example augmentations:
- https://github.com/aleju/imgaug#example-images
- https://github.com/albumentations-team/albumentations#a-few-more-examples-of-augmentations
---
### Data augmentations and Deep Learning
Other libraries for computer vision:
- torchvision
- albumentations (OpenCV as backend)
- NVidia DALI (on GPU)
- MONAI (on medical data)
- ...
---
### :triangular_ruler: Geometric augmentations
- Cropping / Padding
- Flips / Rotation
- **Resizing**
- **Affine** / Perspective / Elastic
- etc
---
### Image resizing
<img style="background: darkgray;" width=400 src="https://i.imgur.com/SV3NHqk.png"/>
```python
from torchvision.transforms.functional import resize, InterpolationMode
out = resize(inpt, (64, 64), InterpolationMode.NEAREST, **kwargs)
```
Parameters:
- output size or scale
- interpolation method
- anti-aliasing and other options
---
### Image resizing
_torchvision_ uses either `PIL.Image.resize` or `torch.nn.functional.interpolate`:
```python
# NEAREST interpolation
ix = round((ox + 0.5) * scale - 0.5)
...
output[oy, ox] = input[iy, ix]
```
```python
# LINEAR interpolation
ix1 = floor(f(ox, scale))
ix2 = ix1 + 1
w2 = f(ox, scale) - ix1
w1 = 1.0 - w2
...
output[oy, ox] = w1 * input[iy1, ix1] + w2 * input[iy2, ix2]
```
---
### Image/BBox/Mask resizing
```python
from torchvision.prototype import features
from torchvision.prototype.transforms.functional import resize_image_tensor, resize_bounding_box, resize_segmentation_mask
out_boxes = resize_bounding_box(in_boxes, out_size, in_boxes.image_size)
out_mask = resize_segmentation_mask(in_mask, out_size)
out_image = resize_image_tensor(in_image, out_size)
```
<img style="background: darkgray;" width=450 src="https://i.imgur.com/yxAafHx.png"/>
---
### Image/BBox/Mask resizing
Prototype Transforms API in _torchvision_ by Philip
```python
from torchvision.prototype import transforms
resize_op = transforms.Resize((48, 52))
transformed_data = resize_op(in_image, in_boxes, in_mask)
out_image, out_boxes, out_mask = transformed_data
```
Stay tuned for the official announcement :slightly_smiling_face:
---
### Affine image transformation
<img style="background: darkgray;" width=450 src="https://i.imgur.com/PIPBQyW.png"/>
Affine matrix: `C * R * S * Sh * IC * Tr`
```
M = [[a, b, t1],
[c, d, t2],
[0, 0, 1]]
[nx, ny, 1] = [x, y, 1] @ M.T
```
---
### Affine image transformation
Parameters:
- rotation angle
- scale
- translations
- shear X/Y
- transformation center
to generate an affine matrix
---
### Affine image transformation
<img style="background: darkgray;" width=400 src="https://i.imgur.com/PoI6Giy.png"/>
For images (nearest interpolation mode):
```python
inv_affine_matrix = inverse(affine_matrix)
for out_y in range(out_image_h):
for out_x in range(out_image_w):
output_pt = [out_x + 0.5, out_y + 0.5, 1.0]
input_pt = inv_affine_matrix @ output_pt
in_x, in_y, _ = round(input_pt - 0.5)
if 0 <= in_x < in_image_w and 0 <= in_y < in_image_h:
out_image[out_y, out_x] = in_image[in_y, in_x]
```
---
### Affine image transformation
```
bboxes = [[x1, y1, x2, y2], ...] # xyxy format
```
For bounding boxes:
```python
out_boxes = []
for xmin, ymin, xmax, ymax in in_boxes:
points = [(xmin, ymin, 1), (xmax, ymin, 1), (xmax, ymax, 1), (xmin, ymax, 1)]
new_points = points @ affine_matrix.T
out_boxes.append(
[
min(new_points, 0), min(new_points, 1),
max(new_points, 0), max(new_points, 1),
]
)
```
---
### :rocket: Performance matter
- Dataflow runs in parallel to NN computations
- Dataflow can be a bottleneck: loading, decoding, **augs**, etc
---
### :gear: Image resizing combinatorics
Input:
- Dimensions: 3D, 4D, 5D tensors
- Data type: uint8, float32, uint16, ...
- Memory format: channel last, channel first
- Device: cpu, cuda
Parameters:
- Interpolation modes: nearest, bilinear, bicubic
- Anti-aliasing: true / false
---
#### [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html?highlight=interpolate#torch.nn.functional.interpolate)
A single python method to resize 3D, 4D, 5D tensors
- supports mostly floating dtypes
- CF/CL memory formats, cuda/cpu devices
- modes: nearest, bilinear, bicubic, area, ...
Performance (cpu):
PIL ~ torch interpolate >> OpenCV
---
#### [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html?highlight=interpolate#torch.nn.functional.interpolate)
#### using `TensorIterator` for CPU
- :+1: Removed specific 1D, 2D, 3D loops
- :+1: Optimized computations
- most of cases (~2x-3x speed up)
- :+1: Unified the code for modes and dims
- :-1: But still worse for 2d/3d channels last cases
---
#### How it works with `TensorIterator` ?
- Precompute source indices and weights
- for each dimension
For example, for bilinear mode:
- output, source (restrided)
- index_x1, index_x2, wx1, wx2 of size owidth
- index_y1, index_y2, wy1, wy2 of size oheight
---
#### How it works with `TensorIterator` ?
- Use implicit compiler vectorization
- static assumptions on strides
```c++
// special-cases to let the compiler apply compile-time input-specific optimizations
if ((strides[0] == sizeof(scalar_t) && (strides[1] == 0) &&
// NOLINTNEXTLINE(bugprone-branch-clone)
check_almost_all_zero_stride<out_ndims, 1, scalar_t, int64_t, interp_size>(&strides[2]))) {
// contiguous channels-first case
basic_loop<scalar_t, int64_t, out_ndims, interp_size>(data, strides, n);
} else if ((strides[0] == sizeof(scalar_t) && (strides[1] == sizeof(scalar_t)) &&
check_almost_all_zero_stride<out_ndims, -1, scalar_t, int64_t, interp_size>(&strides[2]))) {
// contiguous channels-last case
basic_loop<scalar_t, int64_t, out_ndims, interp_size>(data, strides, n);
} else {
// fallback
basic_loop<scalar_t, int64_t, out_ndims, interp_size>(data, strides, n);
}
```
---
### Benchmarks
- https://github.com/pytorch/pytorch/pull/51653
- https://github.com/pytorch/pytorch/pull/54500
It was a fun challenge to beat previous benchmarks
---
### Challenges
- Other dtypes support: uint8, ...
- Improvements over channels last memory format
---
### Adding anti-aliasing (AA) option
<img width=300 src="https://raw.githubusercontent.com/GaParmar/clean-fid/main/docs/images/resize_circle_extended.png"/>
<img width=300 src="https://pbs.twimg.com/media/FDVGYBgVIAEsjsg?format=jpg&name=900x900"/>
- https://github.com/GaParmar/clean-fid
- Image scaling attacks
---
### How AA works ?
For example, bilinear mode and scale=4:
```
i1, i2, w1, w2 -> 9 indices and weights
```
Larger number of source pixels is used to compute output pixel
---
#### Implementation using `TensorIterator` (CPU)
Sub-optimal solution:
- Naively precomputing all indices and weights for all dims (e.g. like bicubic)
Better solution:
- Separable resizing: resizing dim by dim
- Using bounds for indices
---
#### Implementation for GPU
Sub-optimal solution:
- local max-size weights allocation
```c++
# for each thread
scalar_t wx[256];
scalar_t wy[256];
```
Better solution:
- use shared memory and compute shared weights for specific blocks
---
#### Implementation for GPU
```c++
extern __shared__ int smem[];
scalar_t* wx = reinterpret_cast<scalar_t*>(smem) + interp_width * threadIdx.x;
scalar_t* wy = reinterpret_cast<scalar_t*>(smem) + interp_width * blockDim.x + interp_height * threadIdx.y;
if (threadIdx.y == 0)
{
// All threadIdx.y have the same wx weights
upsample_antialias::_compute_weights<scalar_t, accscalar_t>(wx, ...);
}
if (threadIdx.x == 0)
{
// All threadIdx.x have the same wy weights
upsample_antialias::_compute_weights<scalar_t, accscalar_t>(wy, ...);
}
```
- It was fun to write CUDA kernels and optimize them
---
#### Benchmarks: downsampling with AA
PIL vs torch CPU vs torch GPU
```
Num threads: 8
[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 2851.2 | 874.1 | 57.1
channels_last non-contiguous torch.float32 | 2856.1 | 1155.8 | 130.6
Times are in microseconds (us).
Num threads: 8
[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------]
| Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
channels_first contiguous torch.float32 | 4522.4 | 1406.7 | 170.3
channels_last non-contiguous torch.float32 | 4530.0 | 1435.4 | 242.2
Times are in microseconds (us).
```
---
#### Bugs in resize op influencing Deep Learning
- Nearest interpolation mode is broken for OpenCV and PyTorch
- both introduced `nearest-exact` to fix it
- TF1 resize op and Deep Lab image size `321x513`
https://ppwwyyxx.com/blog/2021/Where-are-Pixels/
---
#### Bugs in resize op influencing Deep Learning
Compatible implementations ?
```
PyTorch vs OpenCV vs Scikit-Image vs Pillow vs TF
```
- PyTorch <--> OpenCV with bugs
---
### Thank you!
Any questions :question:
{"metaMigratedAt":"2023-06-16T22:45:04.943Z","metaMigratedFrom":"YAML","title":"Geometrical data augmentations for computer vision and internals","breaks":true,"description":"View the slides with \"Slide Mode\".","contributors":"[{\"id\":\"c675cf56-5615-4590-b8e7-bbec34e54415\",\"add\":16625,\"del\":5407}]"}