# uint8 support for images in interpolate()
**Goal**: support uint8 images in interpolate on CPU for the cross-product of:
- *linear* and *cubic* interpolation (*nearest* already supported)
- with or without antialias
- channels_first and channels_last layout
We're writing 2 main implementations: an optimized AVX version as inspired from PIL-SIMD, and a fallback that works in more general cases (and if AVX isn't supported).
dev branch: https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev
We'll submit PRs there and then migrate all to master when done. **ETA: Dec 16**
Current status
==============
AVX code
--------
- on 2D images, C == 3 only
- channels last only
- N == 1 only
- bilinear only
- antialias=True only
- Can easily extend to:
- C < 3 (by unpacking differently)
- channels first (by unpacking differently)
- bicubic filter. Nearest[exact] is not critical for now as already supported
- what about antialias=False: can this just be a different way to compute the weights?
Fallback
--------
- 3D - maybe, but we only care about 2D for now
- Supports all C and N values
- Supports channels first or last
- Supports bilinear + bicubic (TODO: test bicubic) -- nearest[exact] already supported
- antialias=False not yet supported
TODO
====
(decreasing pri order)
- 1 Add consistency tests between AVX vs fallback and between uint8 vs float
- 2 (Victor) AVX and fallback: support for antialias=False and dtype=uint8 for bilinear and bicubic.
- 3 ~~(Nicolas) AVX: support N > 1, C <= 4 and other filters~~ Done
- 4 ~~(Nicolas) AVX: support for channels first~~
- 5 ~~(Victor) if possible, merge weight computation between AVX and fallback~~
- 6 clean up AVX code
- 7 clean up fallback code
---- mergeable PR threshold ----
- Fallback: optimized version for channels last or first
- avoid memory copy in AVX version - or perhaps copy single rows instead of entire image
- support SSE and / or port to Vec.h
- Dispatch to AVX version later instead of early, i.e. use AVX implementation within the inner loops of TensorIterator.
Done:
- (Nicolas) basic Port of PIL-SIMD implem
- (Victor) Write fallback