Pulp Replication

Pulp CLI configuration supports defining multiple Pulp instances[0]
An ansible role can easily talk to different servers [1]

[0] https://docs.pulpproject.org/pulp_cli/configuration/#config-profiles
[1] https://github.com/mdellweg/squeezer/tree/replicate_pulp

Problem Statement

Users have trouble serving the exact same content in multiple data centers (DC) or geographies (Geo).

The Current Solution

Users setup a Pulp in each DC or Geo and configure them to sync from each other. This takes a lot of work.

Opportunity

Make configuring one Pulp to "be just like another" easier.

Terminology

Primary Pulp - The Pulp where content originates from
Replica Pulp - The Pulp that receives it's content from the Primary Pulp

Replica Repo - A Repo on a Replica Pulp that is configured to sync from a Repo on a Primary Pulp
Replica Distribution - A Distribution on Replica Pulp that has the same base_path and Repository pairing as a Distribution on the Primary Pulp
Replica Content Guard - A Content Guard on a Replica Pulp that is configured to guard a Replica Distribution the same way as a Content Guard on the Primary Pulp's Distribution

Background Sync - A on_demand sync followed up by a immediate sync

Use Cases

As a user I can

  • declare a Replica Repo on a Replica Pulp that also has a remote which will sync from a Primary Pulp

  • declare a distribution on a Replica Pulp that matches the repo and base_path of a distribution on a Primary Pulp and any associated content guards

  • trigger a background sync on a Replica Pulp Replicate Repo

  • configure a periodic task that creates Replicate Repos and Replicate Distributions for all Repositories and Distributions from a Primary Pulp

  • configure a priodic task that triggers background syncs every N minutes

Proposal for CLI

Do the simplest, highest value thing first and deliver that as a fully working thing.

  • Create a Replica Repo
    • Assumptions
      • Each repository has only one distribution associated with it
    • CLI command
      • pulp file repository replicate replica-profile <profile name> name <repository name>
      • This command will do:
        • Find the repository and associated distribution in the default profile pulp
        • Create a remote on the replica pulp pointing to the base_url of the Distribution on the default pulp
        • Create a repository on the replica pulp that has the same attributes as the one on the default pulp
        • Create a distribution on the replica pulp that matches the distribution on the default pulp
        • Sync the repository on the replica pulp

Proposal for Pulp API

  • PulpServer API with full CRUD

    • base_url
    • username
    • password
    • api_root
    • cert
    • key
    • verify_ssl
    • label_to_replicate
  • 'Replicate' action on the PulpServer API will dispatch a Task Group that will do the following:

    • If label_to_replicate is specified:
      • Search for all distributions with the specified label
      • Create a remote, repository, and distribution for each discovered distribution with autopublish enabled
      • Sync each repository
    • If no label_to_replicate, replicate all distriutions.

Questions

  • What should happen when an upstream distribution is not point to any repository or publication? Syncing would produce a 404.

  • For RPM repositories, which sync_policy should be used? 'mirror_content_only' or 'mirror_complete'?

Difference between remotes

RpmRemote

  • sles_auth_token

UlnRemote

  • uln_server_base_url

AptRemote

  • distributions
  • components
  • architectures
  • sync_sources
  • sync_udebs
  • sync_installer
  • gpgkey
  • ignore_missing_package_indices

ContainerRemote

  • upstream_name
  • include_foreign_layers
  • include_tags
  • exclude_tags
  • sigstore

RoleRemote (ansible)

CollectionRemote (ansible)

  • requirements_file
  • auth_url
  • token
  • sync_dependencies
  • signed_only

GitRemote (ansible)

  • metadata_only
  • git_ref

Differences between Repositories

RpmRepository

  • metadata_signing_service
  • original_checksum_types
  • last_sync_details
  • retain_package_versions
  • autopublish
  • metadata_checksum_type
  • package_checksum_type
  • gpgcheck
  • repo_gpgcheck
  • sqlite_metadata

FileRepository

  • manifest
  • autopublish

AptRepository

ContainerRepository

  • manifest_signing_service

AnsibleRepository

  • last_synced_metadata_time

Differences in Distributions

ContainerDistribution

  • namespace
  • private
  • description
Select a repo