# Pulp3 and FilesystemExports
This is in response to https://bugzilla.redhat.com/show_bug.cgi?id=2028377#c12
(In reply to Glenn Snead from comment #12)
> I took a look at https://github.com/Katello/katello/pull/9925, and I don't
> see the difference from the existing Satellite 6.10 content view export
> method.
>
> Unless the resulting tarball has the proper file tree i.e.
> content/rel8/x86_64/baseos/os/{Packages,repodata} format with a full copy of
> the latest repository metadata we cannot use Satellite 6.10 to support
> disconnected customers who are running their own Satellite servers. These
> Satellite servers expect an available CDN server to supply them with content
> regardless of what their Satellite Organization(s) are named, and what is in
> their Organization's entitlement manifest.
Pulp3 supports this kind-of export, it's called a FilesystemExport. It currently doesn't produce the tarfile/toc/chunk that PulpExport does, but it does export to-the-filesystem a specified repository publication. This is in tech-preview, because it hasn't had much use/testing, and therefore needs more eyes/thoughts on whether there are requirements we haven't thought of - this BZ is prob exactly what it needs :)
This code missed the 3.14 deadline, it's in 3.15
Doc is here:
* https://docs.pulpproject.org/pulpcore/restapi.html#tag/Exporters:-Filesystem
* https://docs.pulpproject.org/pulpcore/restapi.html#tag/Exporters:-Filesystem-Exports
The export-workflow is "create a FilesystemExporter, with a name and an export path; invoke that exporter with a repository-publication href or a repository-version to create a filesystem-export; tar the results."
Example script, starting from "create and sync a repo": (NB: pulp-cli doesn't have file-export support yet, so I use direct HTTP requests)
```
$ pulp rpm remote create --name test --url https://fixtures.pulpproject.org/rpm-signed/ --policy immediate
$ pulp rpm repository create --name test --remote test
$ pulp rpm repository sync --name test
$ pulp rpm publication create --repository test # use resulting publication-HREF below
$ http POST :/pulp/api/v3/exporters/core/filesystem/ \
name=test \
path=/tmp/fsexports/ \
method=write # use resulting Exporter-HREF below
$ http POST :/pulp/api/v3/exporters/core/filesystem/cad6493d-9412-4bf7-95ad-d2fb8b74fdd1/exports/ \
publication=/pulp/api/v3/publications/rpm/rpm/ad5c8424-1adb-4966-b619-01b9d19ecc74/
$ ls /tmp/fsexports/
bear-4.1-1.noarch.rpm dolphin-3.10.232-1.noarch.rpm horse-0.22-2.noarch.rpm squirrel-0.1-1.noarch.rpm
camel-0.1-1.noarch.rpm duck-0.6-1.noarch.rpm kangaroo-0.2-1.noarch.rpm stork-0.12-2.noarch.rpm
cat-1.0-1.noarch.rpm duck-0.7-1.noarch.rpm kangaroo-0.3-1.noarch.rpm tiger-1.0-4.noarch.rpm
cheetah-1.25.3-5.noarch.rpm duck-0.8-1.noarch.rpm lion-0.4-1.noarch.rpm trout-0.12-1.noarch.rpm
chimpanzee-0.21-1.noarch.rpm elephant-8.3-1.noarch.rpm mouse-0.1.12-1.noarch.rpm walrus-0.71-1.noarch.rpm
cockateel-3.1-1.noarch.rpm fox-1.1-2.noarch.rpm penguin-0.9.1-1.noarch.rpm walrus-5.21-1.noarch.rpm
cow-2.2-3.noarch.rpm frog-0.1-1.noarch.rpm pike-2.2-1.noarch.rpm whale-0.2-1.noarch.rpm
crow-0.8-1.noarch.rpm giraffe-0.67-2.noarch.rpm repodata wolf-9.4-2.noarch.rpm
dog-4.23-1.noarch.rpm gorilla-0.62-1.noarch.rpm shark-0.1-1.noarch.rpm zebra-0.1-2.noarch.rpm
$ cd /tmp
$ tar cvf test.tar fsexports/
fsexports/
fsexports/bear-4.1-1.noarch.rpm
fsexports/camel-0.1-1.noarch.rpm
fsexports/cat-1.0-1.noarch.rpm
fsexports/cheetah-1.25.3-5.noarch.rpm
fsexports/chimpanzee-0.21-1.noarch.rpm
fsexports/cockateel-3.1-1.noarch.rpm
fsexports/cow-2.2-3.noarch.rpm
fsexports/crow-0.8-1.noarch.rpm
fsexports/dog-4.23-1.noarch.rpm
fsexports/dolphin-3.10.232-1.noarch.rpm
fsexports/duck-0.6-1.noarch.rpm
fsexports/duck-0.7-1.noarch.rpm
fsexports/duck-0.8-1.noarch.rpm
fsexports/elephant-8.3-1.noarch.rpm
fsexports/fox-1.1-2.noarch.rpm
fsexports/frog-0.1-1.noarch.rpm
fsexports/giraffe-0.67-2.noarch.rpm
fsexports/gorilla-0.62-1.noarch.rpm
fsexports/horse-0.22-2.noarch.rpm
fsexports/kangaroo-0.2-1.noarch.rpm
fsexports/kangaroo-0.3-1.noarch.rpm
fsexports/lion-0.4-1.noarch.rpm
fsexports/mouse-0.1.12-1.noarch.rpm
fsexports/penguin-0.9.1-1.noarch.rpm
fsexports/pike-2.2-1.noarch.rpm
fsexports/shark-0.1-1.noarch.rpm
fsexports/squirrel-0.1-1.noarch.rpm
fsexports/stork-0.12-2.noarch.rpm
fsexports/tiger-1.0-4.noarch.rpm
fsexports/trout-0.12-1.noarch.rpm
fsexports/walrus-0.71-1.noarch.rpm
fsexports/walrus-5.21-1.noarch.rpm
fsexports/whale-0.2-1.noarch.rpm
fsexports/wolf-9.4-2.noarch.rpm
fsexports/zebra-0.1-2.noarch.rpm
fsexports/repodata/
fsexports/repodata/68f65a6687ddf7616b17a283382da9303e1132913e4e4756e5255346ec5de3ca-primary.xml.gz
fsexports/repodata/f437734156437b264d07c9f17896d39953b6659d918a9ae605c5ec26301673ce-filelists.xml.gz
fsexports/repodata/06df8526da1cbf5a73951c97fa066a076ee0c95d916c90992e3ec5b078d6a651-other.xml.gz
fsexports/repodata/b65a7e04ba544b7339e848e23e0cd7021843e1c29eed06cb04707658c0aeb699-updateinfo.xml.gz
fsexports/repodata/a71ea6fe231802411ac8df72e00b74b5211c3dec3ffefefa96f8473691b03d7b-comps.xml
fsexports/repodata/repomd.xml
$
```
The import-workflow, in this context, is "copy test.tar to downstream and unpack, create a "file:" remote pointing to the unpack location, and sync into a downstream repository. Starting from the above:
```
$ pulp rpm remote create --name file --url file:/tmp/fsexports --policy immediate
$ pulp rpm repository create --name file --remote file
$ pulp rpm repository sync --name file
$ http :/pulp/api/v3/tasks/696723d6-5c67-444f-921f-194e34b3acf3/
{
"child_tasks": [],
"created_resources": [
"/pulp/api/v3/repositories/rpm/rpm/3ba87621-c246-47c0-baa1-a4c7af2f4eba/versions/1/"
],
"error": null,
"finished_at": "2022-03-03T16:55:16.720442Z",
"logging_cid": "4120190b8c3a496b9787bd76f8eadc3d",
"name": "pulp_rpm.app.tasks.synchronizing.synchronize",
"parent_task": null,
"progress_reports": [
{
"code": "sync.downloading.metadata",
"done": 6,
"message": "Downloading Metadata Files",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "sync.downloading.artifacts",
"done": 0,
"message": "Downloading Artifacts",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "associating.content",
"done": 43,
"message": "Associating Content",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "sync.parsing.packages",
"done": 35,
"message": "Parsed Packages",
"state": "completed",
"suffix": null,
"total": null
},
{
"code": "sync.parsing.comps",
"done": 3,
"message": "Parsed Comps",
"state": "completed",
"suffix": null,
"total": 3
},
{
"code": "sync.parsing.advisories",
"done": 4,
"message": "Parsed Advisories",
"state": "completed",
"suffix": null,
"total": 4
}
],
"pulp_created": "2022-03-03T16:55:15.731439Z",
"pulp_href": "/pulp/api/v3/tasks/696723d6-5c67-444f-921f-194e34b3acf3/",
"reserved_resources_record": [
"/pulp/api/v3/repositories/rpm/rpm/3ba87621-c246-47c0-baa1-a4c7af2f4eba/",
"shared:/pulp/api/v3/remotes/rpm/rpm/56b5c79e-b2ed-4f43-88b3-27d030ef1234/"
],
"started_at": "2022-03-03T16:55:15.781989Z",
"state": "completed",
"task_group": null,
"worker": "/pulp/api/v3/workers/8734ffe2-504a-4cb1-af83-fb6872918d4a/"
}
```
The net is, I think we have much of the machinery in place to answer this need on the Pulp side. We need to think about how we expose it in a way that makes sense to the Satellite-user, and there's work to be done on packaging the result (eg, teach FilesystemExport about tar and toc and chunks etc).
>
> Is there a plan for this use case? We have a lot of customers, 83 at the
> current state with hundreds more to come, who are depending on it.
>
> Here's what we need:
>
> Full repository exports containing a full copy of each repository's metadata
> that is both Satellite Organization, Content View, and software entitlement
> manifest neutral.
> Incremental repository exports containing a full copy of each repository's
> metadata that is both Satellite Organization, Content View, and software
> entitlement manifest neutral.
fsexport doesn't support incrementals, that would be a new feature
> Single full or incremental repository exports containing a full copy of each
> repository's metadata that is both Satellite Organization, Content View, and
> software entitlement manifest neutral. This is to address critical CVEs that
> have mission critical impact. Think heart bleed and log4j.
## Notes from 9-MAR
### attendees: bbuckingham, paji, ggainey, gsnead
* requirements:
* disconnected CDN
* is the "upstream" for pre-vetted "downstreams"
* currently 5.5Tb in AWS
* one in AWS that can talk to us
* produces tarfile exports
* tarfiles moved to 'disconnected' satellite
* disconnected satellite syncs
* downstreams are vetted, sync from this upstream
* questions
* how does current Sat6 repo-export not meet our needs?
### Proposed solution from katello point of view:
* Katello will use fsexport to export contents of a repository along with its last metadata.
* The user is responsible for creating, enabling and syncing this repository in their destination satellite
* Something like
```bash
$ hammer content-export complete repository --id=22 --cdn-format=true
[.....................................................................................................................................................................................] [100%]
Exported to '/var/lib/pulp/exports/2022-03-16T20-08-35-00-00/Default_Organization/Library/content/dist/rhel/server/7/7Server/x86_64/ansible/2.4/os'.
$ ls -lh '/var/lib/pulp/exports/2022-03-16T20-08-35-00-00/Default_Organization/Library/content/dist/rhel/server/7/7Server/x86_64/ansible/2.4/os'
....
-rw-r--r--. 1 pulp pulp 489K Mar 16 20:08 python-passlib-1.6.5-1.1.el7.noarch.rpm
drwxr-xr-x. 2 pulp pulp 4.0K Mar 16 20:08 repodata
-rw-r--r--. 1 pulp pulp 22K Mar 16 20:08 sshpass-1.06-1.el7.x86_64.rpm
.....
```
* All rpms will be hard linked to the correct pulp artifacts to save on space.
* The user would have to manually tar gzip them and send to downstream katello server.
* No listing files in the destination (which have to be generated for import.) We could look into hammer generating that automatically.
* PS: No support for incremental exports (until fsexports supports it.)
*
### On the downstream katello server
* Extract the archive and copy/rsync it to the local cdn.
* User would have to manually create listing files for releasever and arch if its not already there in the local cdn.
* The user would have to configure the katello server to pull from local cdn.
* Enable the interested repo using manifest via rh repos page.
* Sync the enabled repository.
### Notes/Questions on the proposed solution
* I asssume I would use tar's "--hard-dereference" option for hard links. Can someone confirm?
* Would the exported repository metadata be full copies of each repository's metadata?
* I have a python script Rich Jerido wrote years ago to create any missing listings files.
* Incremental exports is a must-have, as RHEL 7 repositories are quite large. The base repository is over 57 GB in size.
###### tags: `import/export`