# Pulp3 Deployment Considerations # Current Usage Pulp2 used for a large deployment, serves content to * Pulls down content from content sources, e.g. RH or other channels * Snapshot content on bi-weekly or weekly cadence * Multiple PoPs * Desiring to roll out RHEL 9 * Custom tooling to organize the repos and promotion using Pulp APIs * Peforms some quality checks, e.g. linting, signature checks, etc * Copies content between repos * Uses rsync distributor to a webserver # Goal * Desiring the ability to have the content live natively on the cloud instead of having cloud * Having PoPs serve content when they are disconnected from the other PoPs # Use Cases ## Snapshot Use Cases * As a user I can … * Define snapshot RedHat CDN content via console.redhat.com to * Easily connect systems to any snapshot with my existing RH credentials ## Point of Presence Use Cases * As a user I can … * Launch a point of presence (PoP) which will auto-register with console.redhat.com * Configure the PoP to sync one or more c.rh.c snapshot repositories * On Demand - Metadata only, binary data delivered as pull-through cache * Full Sync - Metadata and binary data synchronized # Operator notes * Images: Currently the operator makes use of the following images: * operator itself: https://quay.io/repository/pulp/pulp-operator * pulp application: https://quay.io/repository/pulp/pulp * reverse proxy: https://quay.io/repository/pulp/pulp-web You can use other application/webserver images by declaring `image` and `image_web` in the Pulp CR, those images can be built from: https://github.com/pulp/pulp-oci-images/tree/latest/images/pulp/stable And the operator image can be built from the operator repo: https://github.com/pulp/pulp-operator/blob/main/Dockerfile * Operator RBAC: * https://github.com/pulp/pulp-operator/blob/main/bundle/manifests/pulp-operator.clusterserviceversion.yaml#L1122 * Docs: https://docs.pulpproject.org/pulp_operator/ # Architecture # Questions (Pulp) * "quality check the package" * Won't clash with existing package name * has changelog * signed with the correct key * How much third-party content, custom content * Mostly Red Hat * What are the primary compose workflows * Repos are managed as bundles, treated in a sense as immutable snapshots * Only use newest versions, no "incremental update" with errata * Is rollback an aspect of the Pulp3 feature set that is useful? * Yes, but currently the Pulp2 distributor allowed them to publish a point in time. It did take a long time though * If Pulp3 had an Rsync Exporter (like the Pulp2 rsync distributor) would you use that instead of launching a container based Pulp on the Pop? * One is a push model, the other is a pull model * Filesystem export + Rsync, or native Rsync * What is the high-availability need? # AI * paul: Issue discovered: checksums of pulp_rpm repos aren't available for on_demand repos. Need a reproducer reported * How to reproduce: * Create a repo * Create a remote * Sync the repo with policy `on_demand` * ... * bmbouter: discuss with pulpcore if we can prioritize 1817 or 3155 * biggest issue is security - how do we make sure sensitive data is always censored appropriately * bmbouter: to organize a cost estimate calculator # Next # March 20, 2024 Have a PR open to fix replication bug: https://github.com/pulp/pulpcore/pull/5140 # March 6, 2024 Need to prioritize https://github.com/pulp/pulpcore/issues/4637 # Nov 1 * pulpcon is next week Nov 6 - 9, [agenda here](https://hackmd.io/x9ojVHY4RzCr9_pOhMVlBw?view) * slides for multi-geo pulpcon talk [here](https://hackmd.io/PV6aeDqjT06yn9ed63wwmQ) * pulpcore 3.40 released * now contains Pulp file * will require an update to any plugin for compatibility reasons, e.g. pulp_rpm * upgrading should be done with a planned outage still, in the future it can be done online * note the pulp-oci-images now runs the migrations as a separate container * https://github.com/pulp/pulp-oci-images/blob/latest/images/compose/compose.yml#L34-L43 * what to do with a replica server that has had changes made to it? * please file a bug on this and we'll look at it # Oct 18 * Production is going well * How to store some arbitrary data on a Repository, e.g. notes about a specific package being present in a repository * recommendation: use the label API on a Repository and use NEVRA as the key and whatever needs to be stored as the value * Telemetry Update * Pulp tests are being merged: https://github.com/pulp/pulpcore/pull/4414 * OTEL upstream PR for aiohttp is near merging: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1800 * after merging ^ OTEL needs to release it, then Pulp needs to "enable" it in pulp-content, document it, and release it * Pulpcon coming up Nov 6-9th # Oct 4 * Went into production * The replica workflow gets failed tasks and it's difficult to see from the task listing what objects were failed to update * next step: @chida to file the bug as a usability bug, and @dkliban will fix * Want to have the '/' resource on the pulp service be a dynamic html page * suggestion: use the reverse proxy to deliver that page * How to configure authZ that is active directory or LDAP based? * https://hackmd.io/ED9UpscNSRW86Le3xNzVeg * https://docs.pulpproject.org/pulpcore/authentication/webserver.html # August 23 * Does Pulp have an LTS? * No, but we do backport to some recent branches, which are [listed here](https://github.com/pulp/pulpcore/blob/main/template_config.yml#L21) * How do we configure open telemetry support * There are some docs here: https://docs.pulpproject.org/pulpcore/components.html#telemetry-support * Also you can mimic what the otel profile in the dev environment does: https://github.com/pulp/oci_env/tree/main/profiles/opentelemetry_dev * Replication is working nicely * How do we enable the RBAC permissions to mimic the upstream ones? * We currently don't support this, you can only do this with admin currently * Maybe there is an upstream issue already filed for this? * https://github.com/pulp/pulpcore/issues/3994 * Can I create an admin user via the API? * Not currently * @Chida will file an issue for this request * Likely switching our Python runtime to 3.9 # August 9 * Clusters are spun up with testing and replication being used * Having an issue with password rotation of the database * password changes and Pulp needs to be restarted * replication updates * commands have been added to the CLI * bugfixes for replication have been released, please let us know if anything else isn't working * Need to revisit the OTEL work soon # May 3 * Performance testing is showing the the S3 object storage with PULP_REDIRECT_OBJECT_STORAGE=False causes high memory and CPU relative to a clustered backend solution * pulp_rpm == 3.20 to release today containing the replication bits. It'll be ready for testing # April 19 * Experimenting with solutions to timeouts - is it S3 or not? * Replica support for pulp_rpm should be released by early next week * Metrics work ongoing # March 16 * Pulp 3.23 released with replica support * Metrics Work Ongoing * What do we know we'll get? * for pulp-content and pulp-api we'll get response status, url, and latency raw data * What else would we want? * for tasking we'll get a 1 second summary of: * busy/free proportion * top/sar style resource metrics like cpu usage, ram, network usage within the 1 second summary * for tasking we'll also get event based metrics: * task uuid, task start time, stop time, task name # March 1 * Issue filed: https://github.com/pulp/pulpcore/issues/3621 * pulp-replica, when will it be released? * goal: to be included in 3.23 * gave overview of domains * they are interested in using domains, what happens today is lots of content comes in and sometimes it clashes, e.g. with nevra. Domains would solve this problem * updates on the image tag changes that have been made * use case: get secrets from KMS via a sidecar container that gets the secrets and loads a config map * open telemetry update * metrics and tracing are working well for pulp-api * next step: add support for pulp-worker and pulp-content * I'll record a youtube video showing off the tracing and metrics for pulp * next time: let's discuss feedback / ideas on the metrics for Pulp to find what would be useful # Feb 15 * updated replication demo with labels * performance and scale testing blog post * https://pulpproject.org/2023/02/14/rpm-redirect-serving-perf-scale-testing/ * starting on adding metrics * https://github.com/pulp/pulpcore/issues/3445 # Feb 1 * issue from last week about 0 bytes returned from pulp-content app was legitimately a 0 byte package! So no issue there * the yum/dnf timeout was increased from 2 seconds to 10 second. It was failing with 2 seconds which was for the occasional package just a little too slow * pulp-replica demo * https://youtu.be/ehrd2kawmN0 * talked through the pulp_concurrent setting some * It's concurrent TCP connections from 1 task # Jan 18 * Need to have Pulp proxy the data from S3 because the clients can't reach S3, but they can reach Pulp * suggestion: use the REDIRECT_TO_OBJECT_STORAGE feature * https://docs.pulpproject.org/pulpcore/configuration/settings.html#redirect-to-object-storage * Looking to do first prod deployment towards the end of the month * FYI we provide pytest * Open Telemetry * PoC will be started soon * Demo of CLI base replication * https://youtu.be/aIIgrNILNIk # Dec 14th * Not an issue to continue using non-clustered Redis * Status of deployment * Integrated with additional orchestration/mush APIs * Testing against the RHEL 9 release streams * Open Telemetry * kick off meeting minutes * https://discourse.pulpproject.org/t/monitoring-telemetry-working-group/700/9 * Desired questions to answer: * How much traffic are we getting on this endpoint? * Is there a significant wait for clients to receive downloads? * Am I denying requests from clients due to load? * Cost Analysis * desire: to have a cost estimator for running a Pulp installation on AWS in terms of infra and network storage + network delivery costs # Nov 30th * Issue discovered: checksums of pulp_rpm repos aren't available for on_demand repos. Need a reproducer reported * Redis issue figured out: It was a clustered Redis install, but Pulp doesn't support clustered Redis * <discourse link needed> * Hard to tell when a sync task is on_demand versus immediate. * known issue https://github.com/pulp/pulpcore/issues/1817 * Identified that FS Exports may be an option for their geo distribution * problem is: it doesn't deduplicate RPMs and that's a lot... * Difficult to know when querying a task about what the task is doing. Kind of the only thing to go on is the created resources, which don't get created until the end * https://github.com/pulp/pulpcore/issues/3155 * idea: allow for querying through tasking back to repository attributes # Nov 16th * metrics are needed * health metrics: requests / second, latency * capacity metrics: * interested in https://github.com/pulp/pulpcore/issues/3389 * interested in https://github.com/pulp/pulp-operator/issues/761 * AI: bmbouter to share the open telemetry working group * k8s health checks * This would be helpful: https://github.com/pulp/pulpcore/issues/2844 * here's the operator's use of existing health checks: https://github.com/pulp/pulp-oci-images/blob/latest/images/assets/readyz.py * issues to report: * AWS ElasticCache is not yet working, needs an issue filed * Temp directories during RPM sync are getting to ~50 GB. This is unexpected and problematic, needs an issue filed so pulp dev team can try to reproduce * might be this issue https://github.com/pulp/pulpcore/issues/1936 * multi-geo: * not yet underway, still focused on getting the main pulp server productionized * requested feedback on this brainstorm doc * https://hackmd.io/isQ6Rf73Q56ucscoIbSNyw * update: zero-downtime working group underway # Nov 2 Updates * Updates on the evaluation of the operator? still in progress * Operator questions: * Auto-scale of pods for pulp-operator - not avaialble as of today but it is possible to scale up and down manually. It would be great to have auto scale pulp-workers pods based on the queue of waiting tasks * go based docs: https://docs.pulpproject.org/pulp_operator/ * everything is namespaced in pulp-operator https://docs.pulpproject.org/pulp_operator/en/ansible/roles/pulp-api/#role-variables * https://hackmd.io/SRZmd5L3SMWWyvvjHNE3rQ?view#Tuesday-November-8-User-day-2 talks about pulp-operator k8s deployments * Chida's Installation Discussion and questions * worker's heartbeat * excessive mem usage during sync * redis caching * # Oct 12th Updates * What permissions does the operator need? * https://github.com/pulp/pulp-operator/blob/main/bundle/manifests/pulp-operator.clusterserviceversion.yaml#L1122 * zero-downtime is a concern * https://discourse.pulpproject.org/t/support-zero-downtime-updates/645 * https://app.element.io/?updated=1.11.5#/room/#pulp:matrix.org * https://pulpproject.org/help/#chat-to-us # Sept 28th Updates * general updates * Propose we shorten to a 30 minute, 2 week call * [question from pulp operator team] Could you share some more details about the permissions that k8s operators typically require that are not acceptable for your environment? * it downloads a lot of untrusted assets, but that could be gotten around by pointing to your own registry * the permissions would need a more specific review * they mostly use helm charts today * uncomfortable with the API access to k8s itself because the "deployer" here is not the admin, they are general users * we should be offering a helm chart * Documented the dockerfile, see updates here https://docs.pulpproject.org/pulp_oci_images/ * Two upcoming goals (likely): * combine the single container and the operator images to have one set of technology * product an operations manual * pulpcon coming up Nov 7-12: CFP is open until Monday, we'd welcome any talk about how Pulp3 is being used * https://discourse.pulpproject.org/t/pulpcon-2022-call-for-proposals/590 * Some not yet posted talks: * using the operator * running pulp in containers without an operator * operations guide for pulp # Sept 14th Updates * Using single container to pull RHEL content * Deployed on AWS and using AWS RDS as the db backend for it * Having some issues with running on k8s * Enjoying to associate a repo version and distribution * improvement from pulp2 * next step: to try to use the pulp_installer roles to built a container * need identified: desire a dockerfile that we would share ### HCaaS open questions * privacy - some content is deeply sensitive, questions about which systems are allowed to touch it, which systems are allowed to store it * third party (potentially licensed) content, e.g. content from VMWare, Nvidia * reliability - cybersecurity is critical, updates are critical, the infra must be available, SLAs are important * quality checking (as described earlier) - where would custom, organization-defined quality checks fit into a hosted service model # AI * bmbouter to share dockerfile * Investigate container privileges - running without root * tiho to setup followup time to explore use cases and operational needs for a SaaS model *