# Formats to add to CVAT
## mandatory addition
- Yolov8
- pytorch txt, oriented bounding box
- https://github.dev/lightly-ai/labelformat/tree/main/src/labelformat
- What about other family members (v4, v5, v6, v7)
- different on a very high level, on granular level quite similar (for v4 and v5)
- v6 and v7 are exactly the same as v8
## community/research basis
- OpenLABEL
- https://github.com/cvat-ai/cvat/issues/3999
- https://www.asam.net/standards/detail/openlabel/
- json based, flexible (hence more cases)
- Voxel has this implemented, we can take help: https://docs.voxel51.com/user_guide/dataset_creation/datasets.html#openlabelimagedataset
- Visual Gnome
- Visual question answering datasets with minimal changes to traditional image annotation
- https://paperswithcode.com/dataset/visual-genome
- Example data : https://labelstud.io/templates/visual_genome
- Visual Genome represents a more balanced distribution over 6 question types: What, Where, When, Who, Why and How
- ADE20K
- https://paperswithcode.com/dataset/ade20k
- json based, polygons
- BDD
- https://bair.berkeley.edu/blog/2018/05/30/bdd/
- json based, uses format from [scalable](https://doc.scalabel.ai/format.html)
- Motion Dataset
- Kinetics
- https://paperswithcode.com/dataset/kinetics
- simple to implement, each clip is human annotated with a single action class and lasts around 10 second
- UCF
- https://paperswithcode.com/dataset/ucf101
- more popular than kinetics, longer clips and slightly more annotation
## human related datasets
- CelebA
- https://paperswithcode.com/dataset/celeba
- most citied face dataset, 3000+ papers
- should be relatively simple to implement, csv based
- FFHQ
- https://paperswithcode.com/dataset/ffhq
- high-res, modern faces, json based
## platform to platform basis
- labelbox json
- https://docs.labelbox.com/reference/label-export
- also available with lightly-ai and [roboflow](https://roboflow.com/formats/labelbox-json)
- super big to implement the export part though
- Scale AI
- one of the biggest data annotation tools
- will be helpful for users looking for converting from closed source to open source
- https://roboflow.com/formats/scale-ai-json
- SuperAnnotate
- another one of closed source orgs, good userbase
- https://roboflow.com/formats/superannotate-json
- google autoML, createML from apple
- google autoML : csv based, createML : json based
- roboflow provides support to them, are very niche. Deciding to add them is a major design choice
## 3D
- ScanNET
- https://paperswithcode.com/dataset/scannet
- gaining much popularity, must include for papers in its area
- [example](https://github.com/ScanNet/ScanNet) , we can start 3D with this dataset
- ShapeNET
- https://paperswithcode.com/dataset/shapenet
- traditionally most used
## comments on resources
### https://github.com/lightly-ai/labelformat
- less detailed than out datumaro, but provides basic functionality
- can take inspiration for Yolov8 and labelbox
## HLD
- we add 3 formats
- my choices
- Yolov8, OpenLABEL, CelebA
- stretch goal : labelbox (import)
- issue
- error message issue
- add open label [#3999](https://github.com/cvat-ai/cvat/issues/3999)
- COCO id 0 is no label [#4750](https://github.com/cvat-ai/cvat/issues/4750)
- improve cityscapes [#4828](https://github.com/cvat-ai/cvat/issues/4828)
- stretch issue : check for a leading directory [#3849](https://github.com/cvat-ai/cvat/issues/3849)
- blog post and tutorial