# Content management and disk space clean-up
## Problem statements
1. It is difficult to identify in which repo versions a content unit is present and as a result make it orphan
* solution: introduce an endpoint that would be able to return list of repo version a content unit is present in https://pulp.plan.io/issues/4832
2. Big number of repo versions makes it difficult to orphan a content
* solution: enable users to control number of repo versions kept per repo https://pulp.plan.io/issues/8368
3. Orphan clean up is subotpimal because it locks on resources and cannot run in parallel
* solution: run oprhan clean up in parallel and don't lock on resources https://pulp.plan.io/issues/7659
4. It is not possible to reclaim disk space for the content which i switched to the on_demand/streamed policy
* solution:
5. Should i want to keep track of removed/rejected content I have no option how to blocklist it and make sure this content does not appear in the repo again [keep content/history + free disk space/delete artifact]
* solution: #4
## Meeting Notes March 25
### Agenda
1. Problem 2: Retain N repo-versions https://pulp.plan.io/issues/8368
* should repo version removal be triggered on the repo update call?
* Agreement was yes, trigger version cleanup on repo update
* Should version 0 be deletable?
* Ask katello
* issue created here: https://pulp.plan.io/issues/8454
2. Problem 4:
- when removing artifacts should we create a new version?
- no, the set of content is still the same
- when removing artifacts should we remove them from all versions?
- it's not possible to remove an artifact from a version, an artifact is related to a content.
- provide a separate endpoint which will accept list of repos to reclaim disk space for - remove artifacts only exclusive to those list of repos
- do not touch that content that was uploaded ( aka no RA)
* provide a utility function, do not expose to the users (for now?)
- add keeplist for repo_version
## Meeting notes March 19th
### Problem statement 5:
5. Should i want to keep track of removed/rejected content I have no option how to blocklist it and make sure this content does not appear in the repo again [keep content/history + free disk space/delete artifact]
* solution: #4
### Use Cases
1. Purge an existing content artifact
2. Block it from being uploaded
### Notes
* "Blocklist is a kind of RBAC"
General observations:
* Solution (5) can be solved by resolving solution (4)
* Solving Solution (2) will make (4) easier becuase there will be fewer repo versions to "remove Artifacts for"
**Idea** (but not usable): Have [the pulp_ansible CollectionUploadViewset](https://github.com/pulp/pulp_ansible/blob/71153f4ac0b9b82d1bbad6944b8d8a3e3e5df925/pulp_ansible/app/galaxy/v3/views.py#L313-L377) not accept binary data for content units it already has. That would allow the solving of problem (4) to prevent new-binary data from being added for content units already existing. This is will not work because this assumption prevents existing use cases where users need to resubmit binary data either due to corruption or the need to re-add binary data that was removed during space reclaimation
**Idea** (usable for galaxy_ng): Have the [galaxy_ng subclass of CollectionUploadViewset](https://github.com/ansible/galaxy_ng/blob/289998740a13c0222adeadd365d38a8b3fcdb7aa/galaxy_ng/app/api/v3/viewsets/collection.py#L174-L225) check that the contnt unit does not exist in the 'rejected' repo already prior to reaccepting it.
* PulpTemporaryFile -> Artifact, currently fails if we try to convert temp_file into artifact that already exists: https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/content.py#L318-L332
### Problem 4:
- when removing artifacts should we create a new version?
- no, the set of content is still the same
- when removing artifacts should we remove them from all versions?
- it's not possible to remove an artifact from a version, an artifact is related to a content.
- provide a separate endpoint which will accept list of repos to reclaim disk space for - remove artifacts only exclusive to those list of repos
- do not touch that content that was uploaded ( aka no RA)
* provide a utility function, do not expose to the users (for now?)
Katello issue https://pulp.plan.io/issues/5926