owned this note
owned this note
Published
Linked with GitHub
# Pulp 3 RPM Copy API Brainstorming
## Use Cases
* As a User, I can copy all content from one repo to another repo
* "additive clone repo" use case
* As a User, I can copy all content of a particular type from one repo to another repo
* As a User, I can copy all content matching certain "search criteria" from one repo to another repo
* jsherrill notes - Katello does the filtering themselves, they don't need this
* As a User, I can copy content by HREF from one repo to another
* jsherrill notes - Katello *definitely* needs this
* As a User, when copying content that has an inherent direct dependency relationship to other content, those other content are *also* copied, *always*
* e.g. Modules with RPM artifacts
* Modules declare debug packages as artifacts, debug packages are usually in a different repo, so we can only copy referenced units if present in the repo
* As a User, I have recursive dependency solving during copy
* Able to be disabled, but *enabled by default*
* Performance may be a concern, but we should try to *default* to correctness if possible.
* No "best effort". If we can't find a dependency then the user must specify another repo for use in multi-repo copy?
* (if we can do this, then we should, but it might not be tenable)
* As a User, during recursive copy, I have a means to specify multiple repositories as sources and destinations for the copy operation
* "multi-repo-copy" use case
* Source repositories are matched against destination repositories -- units copied from those source repositories go into their corresponding destination repositories.
## API Design
The problems created by multi-repo-copy and repository versions in particular necessitates an entirely JSON-blob-based copy configuration.
* We must support an arbitrary number of source and destination repositories (multi-repo-copy)
* We must be able to simultaneously filter and copy from all of the source repos (not just the "primary" one like in Pulp 2) for performance reasons
* We can't provide a global content list/criteria to search the source repos for for correctness reasons [0], each pair of source/destination must have its own content list/criteria to use
* We must support setting "base_version" on each of the created repository versions in the destination repositories
* Katello can't just use the latest version**s**, sometimes they have to pick different version**s** of the destination repositor**ies** to use as bases.
So for every pair of source and destination repos, of which there can be an artibtrary number, you also have to be able to (optionally) provide a list of content/criteria to use for the copy and (optionally) use a different "base_version". Trying to do all of those things at once with combinations of parameters would be not be a reasonable thing to do, but, it's not so difficult with JSON.
[0] If multiple repos contain the same content unit, Pulp can't decide which one to copy from. With multiple destination repositories, this is important because the source repository determines the destination repository
### MVP Copy API Concept
```
POST /pulp/api/v3/rpm/copy/
config:=[
{"source_repo_version": "$SRC_REPO_VERS_HREF", "dest_repo": "$DEST_REPO_HREF", "content": [$HREF1, $HREF2]},
{"source_repo_version": "$SRC_REPO_VERS_HREF", "dest_repo": "$DEST_REPO_HREF", "dest_base_version": "$DEST_BASE_VERSION", "content": []},
]
dependency_solving=False
```
There are some obvious extensions that could be made for ergonomics and usability but they are not strictly *necessary* for the MVP, and they're pure extensions. Due to time constraints on integration, we should probably stick to the MVP and add the other niceties later.
Minor TBD: *One* of the source/destination pairs needs to provide a set of content to copy, but they don't *all* technically need to. That might be difficult to validate, though. We could just require that all of the sub-configs provide "content" even if it's just an empty list?
Minor TBD: Whether we default to dependency_solving on or off mostly depends on how much slower it makes common use cases.
#### Endpoint schema
"config" (required, list): Complex configuration object
"dependency_solving" (optional, bool): Enable or disable dependency solving for the copy operation. Default TBD based on performance.
#### JSON config schema
The config is a list of dictionaries where the keys are roughly the parameters you would normally expect for a copy operation
"source_repo_version" (required, repository version href): If provided, uses this repository version as the source content set
"dest_repo" (required, repository href): The repository to create a new version in (copy content into)
"dest_base_version" (optional, int): The base version to use for the destination repository
"content" (TBD, list): The list of content hrefs to be copied
### Future Work
"source_repo_verison" should be able to be optionally substituted with "source_repo":
"source_repo" (optional, repository href): If provided, uses the latest version from the repo as the source content set
"content" should be able to be optionally substituted with or supplemented by "criteria":
"criteria" (optional, dict): If provided, used as a list of filtering criteria
Criteria support is already implemented, but we haven't really thought through the potential use cases very well, and Katello doesn't need it. I would recommend we either mark it as a Tech Preview and liable to change, or keep it disabled for now to keep things simple.
#### Simple API?
The API above is probably overkill for a lot of use cases of the RPM plugin, if multi-repo copy isn't involved. We could pretty easily implement a simpler copy API with a subset of the features that would be much more user-friendly.
Whether adding this is a good idea or not might depend on how much our users end up needing multi-repo copy or using specific repo versions as content sources or base_versions.
```
POST /pulp/api/v3/rpm/simple_copy/
source_repo=$SRC_REPO_HREF
dest_repo=$DEST_REPO_HREF
criteria:={"package": [{"name": "firefox", "arch": "x86_64"}, {"name", "git"}]}
content:=[$HREF1, $HREF2]
dependency_solving=False
```