<style>
.reveal {
font-size: 36px;
}
</style>
# `torchvision`
## Revamp of datasets
Philip Meier [@pmeier](https://github.com/pmeier)
[Quansight](https://www.quansight.com/)
---
## Preliminaries
### "old"
```python
from torchvision import datasets
```
### "new"
```python
from torchvision.prototype import datasets
```
---
## Greatest changes
- Datasets follow the iter-style rather than the map-style than before
- fully compatible with the rework of the dataloader
- allows streaming from remote sources
- Datasets return everything as `Tensor`'s rather than foreign types
----
### iter- to map-style dataset
```python
class MapDataset(Dataset):
def __init__(self, samples, *, decoder=None):
self.samples = samples
self.decoder = decoder
def __getitem__(self, idx):
sample = self.samples[idx]
if self.decoder:
sample = self.decoder(sample)
return sample
def __len__(self):
return len(self.samples)
```
----
```python
map_dataset = MapDataset(
tuple(datasets.load("caltech256", decoder=None))
)
len(map_dataset)
map_dataset[3141]
```
---
## API
---
## Loading
### old
- The namespace exposes one class for each dataset that needs to be instantiated
```python
dataset = datasets.ImageNet(...)
```
----
### Issues
- The names are not standardized other than default camel-case notation
- [#1398](https://github.com/pytorch/vision/issues/1398) `Imagenet` vs. `ImageNet`
----
### new
- All datasets are loaded by name through a single point of entry
```python
dataset = datasets.load("imagnet", ...)
```
```
ValueError: Unknown dataset 'imagnet'. Did you mean 'imagenet'?
```
---
## Data location and download
## old
- First argument is always `root`
- Most datasets support a `download: bool` flag
```python
dataset = datasets.MNIST(root, ..., download=True)
```
----
### Issues
- For the most common use case of having all the data in one place, `root` is superfluous
- Some datasets put the data directly in `root` while others create a directory in it
- The `download` flag should not be needed, because data always has to be downloaded upfront
----
### new
- All data is automatically downloaded and managed by `torchvision`
- The data is stored by default in
`~/.cache/torch/datasets/vision`
- The path can be changed through `datasets.home()` or an environment variable
---
## Attributes
### old
- Some datasets carry additional meta data as attributes
```python
dataset = datasets.ImageNet(...)
dataset.classes
```
----
### Issues
- The extra attributes are not standardized across datasets
- If a dataset carries the information it is usually poorly or not at all documented
----
### new
- All static information can be queried through a single entrypoint
```python
info = datasets.info("imagenet")
info.categories
```
---
## Return type
### old
- Each dataset returns a tuple as sample
- Return types are usually foreign
```python
dataset = datasets.CocoDetection(...)
sample = dataset[0]
print(type(sample[0]))
print(
[
(key, type(value).__name__)
for key, value in sample[1][0].items()
]
)
```
```
<class 'PIL.Image.Image'>
[('segmentation', 'list'), ('area', 'float'), ('iscrowd', 'int'),
('image_id', 'int'), ('bbox', 'list'), ('category_id', 'int'),
('id', 'int')]
```
----
### Issues
- A tuple is a sub-par datastructure to return more information
- Each foreign type needs to be converted to a `Tensor`
----
### new
- Each dataset returns a dictionary containing all the information
- All features are collated and converted and thus ready to use
```python
dataset = datasets.load("coco", annotations="instances")
sample = next(iter(dataset))
print([(key, type(value).__name__) for key, value in sample.items()])
```
```
[('path', 'str'), ('image', 'Tensor'), ('segmentations', 'Tensor'),
('areas', 'Tensor'), ('crowds', 'Tensor'),
('bounding_boxes', 'Tensor'), ('labels', 'Tensor'),
'categories', 'list'), ('super_categories', 'list'),
('ann_ids', 'list')]
```
---
## Transformations
### old
- Each dataset takes a combination of
- `transform`,
- `target_transform`, and
- `transforms`,
that will be applied before the sample is returned
```python
from torchvision import transforms
transform = transforms.RandomHorizontalFlip()
dataset = datasets.ImageNet(..., transform=transform)
```
----
### Issues
- Usage of the keyword argument [is not consistent](https://gist.github.com/pmeier/14756fe0501287b2974e03ab8d651c10)
- It is hard to reuse the same transformation for multiple datasets, since their return types are not standardized
----
### new
- Transformations are completely decoupled from datasets and now are applied afterwards
```python
from torchvision.prototype import transforms
transform = transforms.HorizontalFlip()
dataset = datasets.load("imagenet").map(transform)
```
---
## Implementation
```python
from torchdata.datapipes.iter import IterDataPipe
class MyDataset(datasets.utils.Dataset):
def _make_info(self) -> datasets.utils.DatasetInfo: ...
def resources(
self, config: datasets.utils.DatasetConfig
) -> List[datasets.utils.OnlineResource]: ...
def _make_datapipe(
self,
resource_dps: List[IterDataPipe],
*,
config: datasets.utils.DatasetConfig,
decoder,
) -> IterDataPipe[Dict[str, Any]]:...
```
----
### `def _make_info(self):`
- static information about the dataset
- for example, available categories, homepage, third party dependencies, ...
- can be accessed without loading the datapipe
- can be used to autogenerate documentation for the dataset (TBD)
----
### `def resources(self, config):`
- defines all resources that need to be locally available to start loading the data
- will be downloaded automatically
----
### `def _make_datapipe(self, resource_dps, *, config, decoder):`
- heart of the dataset (varies wildly between different datasets)
- gets already loaded datapipes of all resources
- needs to return a `IterDataPipe[Dict[str, Any]]` with the complete sample
---
## Example 1: [`caltech256`](https://github.com/pytorch/vision/blob/65438e9eba26951206cbfaafeac1d5b1ac805193/torchvision/prototype/datasets/_builtin/caltech.py#L143)
---
## Example 2: [`caltech101`](https://github.com/pytorch/vision/blob/65438e9eba26951206cbfaafeac1d5b1ac805193/torchvision/prototype/datasets/_builtin/caltech.py#L27)
---
## Questions?
{"metaMigratedAt":"2023-06-16T16:00:01.276Z","metaMigratedFrom":"Content","title":"`torchvision`","breaks":true,"contributors":"[{\"id\":\"f173b69f-0663-4d73-aeaa-02b211133e30\",\"add\":15340,\"del\":8832}]"}