Pulp3 and FilesystemExports
This is in response to https://bugzilla.redhat.com/show_bug.cgi?id=2028377#c12
(In reply to Glenn Snead from comment #12)
I took a look at https://github.com/Katello/katello/pull/9925, and I don't
see the difference from the existing Satellite 6.10 content view export
method.
Unless the resulting tarball has the proper file tree i.e.
content/rel8/x86_64/baseos/os/{Packages,repodata} format with a full copy of
the latest repository metadata we cannot use Satellite 6.10 to support
disconnected customers who are running their own Satellite servers. These
Satellite servers expect an available CDN server to supply them with content
regardless of what their Satellite Organization(s) are named, and what is in
their Organization's entitlement manifest.
Pulp3 supports this kind-of export, it's called a FilesystemExport. It currently doesn't produce the tarfile/toc/chunk that PulpExport does, but it does . This is in tech-preview, because it hasn't had much use/testing, and therefore needs more eyes/thoughts on whether there are requirements we haven't thought of - this BZ is prob exactly what it needs :)
This code missed the 3.14 deadline, it's in 3.15
Doc is here:
The export-workflow is "create a FilesystemExporter, with a name and an export path; invoke that exporter with a repository-publication href or a repository-version to create a filesystem-export; tar the results."
Example script, starting from "create and sync a repo": (NB: pulp-cli doesn't have file-export support yet, so I use direct HTTP requests)
The import-workflow, in this context, is "copy test.tar to downstream and unpack, create a "file:" remote pointing to the unpack location, and sync into a downstream repository. Starting from the above:
$ pulp rpm remote create --name file --url file:/tmp/fsexports --policy immediate
$ pulp rpm repository create --name file --remote file
$ pulp rpm repository sync --name file
$ http :/pulp/api/v3/tasks/696723d6-5c67-444f-921f-194e34b3acf3/
{
"child_tasks": [],
"created_resources": [
"/pulp/api/v3/repositories/rpm/rpm/3ba87621-c246-47c0-baa1-a4c7af2f4eba/versions/1/"
],
"error": null,
"finished_at": "2022-03-03T16:55:16.720442Z",
"logging_cid": "4120190b8c3a496b9787bd76f8eadc3d",
"name": "pulp_rpm.app.tasks.synchronizing.synchronize",
"parent_task": null,
"progress_reports": [
{
"code": "sync.downloading.metadata",
"done": 6,
"message": "Downloading Metadata Files",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "sync.downloading.artifacts",
"done": 0,
"message": "Downloading Artifacts",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "associating.content",
"done": 43,
"message": "Associating Content",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "sync.parsing.packages",
"done": 35,
"message": "Parsed Packages",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "sync.parsing.comps",
"done": 3,
"message": "Parsed Comps",
"state": "completed",
"suffix": null,
"total": 3
},
{
"code": "sync.parsing.advisories",
"done": 4,
"message": "Parsed Advisories",
"state": "completed",
"suffix": null,
"total": 4
}
],
"pulp_created": "2022-03-03T16:55:15.731439Z",
"pulp_href": "/pulp/api/v3/tasks/696723d6-5c67-444f-921f-194e34b3acf3/",
"reserved_resources_record": [
"/pulp/api/v3/repositories/rpm/rpm/3ba87621-c246-47c0-baa1-a4c7af2f4eba/",
"shared:/pulp/api/v3/remotes/rpm/rpm/56b5c79e-b2ed-4f43-88b3-27d030ef1234/"
],
"started_at": "2022-03-03T16:55:15.781989Z",
"state": "completed",
"task_group": null,
"worker": "/pulp/api/v3/workers/8734ffe2-504a-4cb1-af83-fb6872918d4a/"
}
The net is, I think we have much of the machinery in place to answer this need on the Pulp side. We need to think about how we expose it in a way that makes sense to the Satellite-user, and there's work to be done on packaging the result (eg, teach FilesystemExport about tar and toc and chunks etc).
Is there a plan for this use case? We have a lot of customers, 83 at the
current state with hundreds more to come, who are depending on it.
Here's what we need:
Full repository exports containing a full copy of each repository's metadata
that is both Satellite Organization, Content View, and software entitlement
manifest neutral.
Incremental repository exports containing a full copy of each repository's
metadata that is both Satellite Organization, Content View, and software
entitlement manifest neutral.
fsexport doesn't support incrementals, that would be a new feature
Single full or incremental repository exports containing a full copy of each
repository's metadata that is both Satellite Organization, Content View, and
software entitlement manifest neutral. This is to address critical CVEs that
have mission critical impact. Think heart bleed and log4j.
Notes from 9-MAR
attendees: bbuckingham, paji, ggainey, gsnead
- requirements:
- disconnected CDN
- is the "upstream" for pre-vetted "downstreams"
- currently 5.5Tb in AWS
- one in AWS that can talk to us
- tarfiles moved to 'disconnected' satellite
- disconnected satellite syncs
- downstreams are vetted, sync from this upstream
- questions
- how does current Sat6 repo-export not meet our needs?
Proposed solution from katello point of view:
- Katello will use fsexport to export contents of a repository along with its last metadata.
- The user is responsible for creating, enabling and syncing this repository in their destination satellite
- Something like
- All rpms will be hard linked to the correct pulp artifacts to save on space.
- The user would have to manually tar gzip them and send to downstream katello server.
- No listing files in the destination (which have to be generated for import.) We could look into hammer generating that automatically.
- PS: No support for incremental exports (until fsexports supports it.)
On the downstream katello server
- Extract the archive and copy/rsync it to the local cdn.
- User would have to manually create listing files for releasever and arch if its not already there in the local cdn.
- The user would have to configure the katello server to pull from local cdn.
- Enable the interested repo using manifest via rh repos page.
- Sync the enabled repository.
Notes/Questions on the proposed solution
- I asssume I would use tar's "–hard-dereference" option for hard links. Can someone confirm?
- Would the exported repository metadata be full copies of each repository's metadata?
- I have a python script Rich Jerido wrote years ago to create any missing listings files.
- Incremental exports is a must-have, as RHEL 7 repositories are quite large. The base repository is over 57 GB in size.