owned this note
owned this note
Published
Linked with GitHub
# Pulp3 Deployment Considerations
# Current Usage
Pulp2 used for a large deployment, serves content to
* Pulls down content from content sources, e.g. RH or other channels
* Snapshot content on bi-weekly or weekly cadence
* Multiple PoPs
* Desiring to roll out RHEL 9
* Custom tooling to organize the repos and promotion using Pulp APIs
* Peforms some quality checks, e.g. linting, signature checks, etc
* Copies content between repos
* Uses rsync distributor to a webserver
# Goal
* Desiring the ability to have the content live natively on the cloud instead of having cloud
* Having PoPs serve content when they are disconnected from the other PoPs
# Use Cases
## Snapshot Use Cases
* As a user I can …
* Define snapshot RedHat CDN content via console.redhat.com to
* Easily connect systems to any snapshot with my existing RH credentials
## Point of Presence Use Cases
* As a user I can …
* Launch a point of presence (PoP) which will auto-register with console.redhat.com
* Configure the PoP to sync one or more c.rh.c snapshot repositories
* On Demand - Metadata only, binary data delivered as pull-through cache
* Full Sync - Metadata and binary data synchronized
# Operator notes
* Images:
Currently the operator makes use of the following images:
* operator itself: https://quay.io/repository/pulp/pulp-operator
* pulp application: https://quay.io/repository/pulp/pulp
* reverse proxy: https://quay.io/repository/pulp/pulp-web
You can use other application/webserver images by declaring `image` and `image_web` in the Pulp CR, those images can be built from: https://github.com/pulp/pulp-oci-images/tree/latest/images/pulp/stable
And the operator image can be built from the operator repo: https://github.com/pulp/pulp-operator/blob/main/Dockerfile
* Operator RBAC:
* https://github.com/pulp/pulp-operator/blob/main/bundle/manifests/pulp-operator.clusterserviceversion.yaml#L1122
* Docs: https://docs.pulpproject.org/pulp_operator/
# Architecture
# Questions (Pulp)
* "quality check the package"
* Won't clash with existing package name
* has changelog
* signed with the correct key
* How much third-party content, custom content
* Mostly Red Hat
* What are the primary compose workflows
* Repos are managed as bundles, treated in a sense as immutable snapshots
* Only use newest versions, no "incremental update" with errata
* Is rollback an aspect of the Pulp3 feature set that is useful?
* Yes, but currently the Pulp2 distributor allowed them to publish a point in time. It did take a long time though
* If Pulp3 had an Rsync Exporter (like the Pulp2 rsync distributor) would you use that instead of launching a container based Pulp on the Pop?
* One is a push model, the other is a pull model
* Filesystem export + Rsync, or native Rsync
* What is the high-availability need?
# AI
* paul: Issue discovered: checksums of pulp_rpm repos aren't available for on_demand repos. Need a reproducer reported
* How to reproduce:
* Create a repo
* Create a remote
* Sync the repo with policy `on_demand`
* ...
* bmbouter: discuss with pulpcore if we can prioritize 1817 or 3155
* biggest issue is security - how do we make sure sensitive data is always censored appropriately
* bmbouter: to organize a cost estimate calculator
# Next
# March 20, 2024
Have a PR open to fix replication bug: https://github.com/pulp/pulpcore/pull/5140
# March 6, 2024
Need to prioritize https://github.com/pulp/pulpcore/issues/4637
# Nov 1
* pulpcon is next week Nov 6 - 9, [agenda here](https://hackmd.io/x9ojVHY4RzCr9_pOhMVlBw?view)
* slides for multi-geo pulpcon talk [here](https://hackmd.io/PV6aeDqjT06yn9ed63wwmQ)
* pulpcore 3.40 released
* now contains Pulp file
* will require an update to any plugin for compatibility reasons, e.g. pulp_rpm
* upgrading should be done with a planned outage still, in the future it can be done online
* note the pulp-oci-images now runs the migrations as a separate container
* https://github.com/pulp/pulp-oci-images/blob/latest/images/compose/compose.yml#L34-L43
* what to do with a replica server that has had changes made to it?
* please file a bug on this and we'll look at it
# Oct 18
* Production is going well
* How to store some arbitrary data on a Repository, e.g. notes about a specific package being present in a repository
* recommendation: use the label API on a Repository and use NEVRA as the key and whatever needs to be stored as the value
* Telemetry Update
* Pulp tests are being merged: https://github.com/pulp/pulpcore/pull/4414
* OTEL upstream PR for aiohttp is near merging: https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1800
* after merging ^ OTEL needs to release it, then Pulp needs to "enable" it in pulp-content, document it, and release it
* Pulpcon coming up Nov 6-9th
# Oct 4
* Went into production
* The replica workflow gets failed tasks and it's difficult to see from the task listing what objects were failed to update
* next step: @chida to file the bug as a usability bug, and @dkliban will fix
* Want to have the '/' resource on the pulp service be a dynamic html page
* suggestion: use the reverse proxy to deliver that page
* How to configure authZ that is active directory or LDAP based?
* https://hackmd.io/ED9UpscNSRW86Le3xNzVeg
* https://docs.pulpproject.org/pulpcore/authentication/webserver.html
# August 23
* Does Pulp have an LTS?
* No, but we do backport to some recent branches, which are [listed here](https://github.com/pulp/pulpcore/blob/main/template_config.yml#L21)
* How do we configure open telemetry support
* There are some docs here: https://docs.pulpproject.org/pulpcore/components.html#telemetry-support
* Also you can mimic what the otel profile in the dev environment does: https://github.com/pulp/oci_env/tree/main/profiles/opentelemetry_dev
* Replication is working nicely
* How do we enable the RBAC permissions to mimic the upstream ones?
* We currently don't support this, you can only do this with admin currently
* Maybe there is an upstream issue already filed for this?
* https://github.com/pulp/pulpcore/issues/3994
* Can I create an admin user via the API?
* Not currently
* @Chida will file an issue for this request
* Likely switching our Python runtime to 3.9
# August 9
* Clusters are spun up with testing and replication being used
* Having an issue with password rotation of the database
* password changes and Pulp needs to be restarted
* replication updates
* commands have been added to the CLI
* bugfixes for replication have been released, please let us know if anything else isn't working
* Need to revisit the OTEL work soon
# May 3
* Performance testing is showing the the S3 object storage with PULP_REDIRECT_OBJECT_STORAGE=False causes high memory and CPU relative to a clustered backend solution
* pulp_rpm == 3.20 to release today containing the replication bits. It'll be ready for testing
# April 19
* Experimenting with solutions to timeouts - is it S3 or not?
* Replica support for pulp_rpm should be released by early next week
* Metrics work ongoing
# March 16
* Pulp 3.23 released with replica support
* Metrics Work Ongoing
* What do we know we'll get?
* for pulp-content and pulp-api we'll get response status, url, and latency raw data
* What else would we want?
* for tasking we'll get a 1 second summary of:
* busy/free proportion
* top/sar style resource metrics like cpu usage, ram, network usage within the 1 second summary
* for tasking we'll also get event based metrics:
* task uuid, task start time, stop time, task name
# March 1
* Issue filed: https://github.com/pulp/pulpcore/issues/3621
* pulp-replica, when will it be released?
* goal: to be included in 3.23
* gave overview of domains
* they are interested in using domains, what happens today is lots of content comes in and sometimes it clashes, e.g. with nevra. Domains would solve this problem
* updates on the image tag changes that have been made
* use case: get secrets from KMS via a sidecar container that gets the secrets and loads a config map
* open telemetry update
* metrics and tracing are working well for pulp-api
* next step: add support for pulp-worker and pulp-content
* I'll record a youtube video showing off the tracing and metrics for pulp
* next time: let's discuss feedback / ideas on the metrics for Pulp to find what would be useful
# Feb 15
* updated replication demo with labels
* performance and scale testing blog post
* https://pulpproject.org/2023/02/14/rpm-redirect-serving-perf-scale-testing/
* starting on adding metrics
* https://github.com/pulp/pulpcore/issues/3445
# Feb 1
* issue from last week about 0 bytes returned from pulp-content app was legitimately a 0 byte package! So no issue there
* the yum/dnf timeout was increased from 2 seconds to 10 second. It was failing with 2 seconds which was for the occasional package just a little too slow
* pulp-replica demo
* https://youtu.be/ehrd2kawmN0
* talked through the pulp_concurrent setting some
* It's concurrent TCP connections from 1 task
# Jan 18
* Need to have Pulp proxy the data from S3 because the clients can't reach S3, but they can reach Pulp
* suggestion: use the REDIRECT_TO_OBJECT_STORAGE feature
* https://docs.pulpproject.org/pulpcore/configuration/settings.html#redirect-to-object-storage
* Looking to do first prod deployment towards the end of the month
* FYI we provide pytest
* Open Telemetry
* PoC will be started soon
* Demo of CLI base replication
* https://youtu.be/aIIgrNILNIk
# Dec 14th
* Not an issue to continue using non-clustered Redis
* Status of deployment
* Integrated with additional orchestration/mush APIs
* Testing against the RHEL 9 release streams
* Open Telemetry
* kick off meeting minutes
* https://discourse.pulpproject.org/t/monitoring-telemetry-working-group/700/9
* Desired questions to answer:
* How much traffic are we getting on this endpoint?
* Is there a significant wait for clients to receive downloads?
* Am I denying requests from clients due to load?
* Cost Analysis
* desire: to have a cost estimator for running a Pulp installation on AWS in terms of infra and network storage + network delivery costs
# Nov 30th
* Issue discovered: checksums of pulp_rpm repos aren't available for on_demand repos. Need a reproducer reported
* Redis issue figured out: It was a clustered Redis install, but Pulp doesn't support clustered Redis
* <discourse link needed>
* Hard to tell when a sync task is on_demand versus immediate.
* known issue https://github.com/pulp/pulpcore/issues/1817
* Identified that FS Exports may be an option for their geo distribution
* problem is: it doesn't deduplicate RPMs and that's a lot...
* Difficult to know when querying a task about what the task is doing. Kind of the only thing to go on is the created resources, which don't get created until the end
* https://github.com/pulp/pulpcore/issues/3155
* idea: allow for querying through tasking back to repository attributes
# Nov 16th
* metrics are needed
* health metrics: requests / second, latency
* capacity metrics:
* interested in https://github.com/pulp/pulpcore/issues/3389
* interested in https://github.com/pulp/pulp-operator/issues/761
* AI: bmbouter to share the open telemetry working group
* k8s health checks
* This would be helpful: https://github.com/pulp/pulpcore/issues/2844
* here's the operator's use of existing health checks: https://github.com/pulp/pulp-oci-images/blob/latest/images/assets/readyz.py
* issues to report:
* AWS ElasticCache is not yet working, needs an issue filed
* Temp directories during RPM sync are getting to ~50 GB. This is unexpected and problematic, needs an issue filed so pulp dev team can try to reproduce
* might be this issue https://github.com/pulp/pulpcore/issues/1936
* multi-geo:
* not yet underway, still focused on getting the main pulp server productionized
* requested feedback on this brainstorm doc
* https://hackmd.io/isQ6Rf73Q56ucscoIbSNyw
* update: zero-downtime working group underway
# Nov 2 Updates
* Updates on the evaluation of the operator? still in progress
* Operator questions:
* Auto-scale of pods for pulp-operator - not avaialble as of today but it is possible to scale up and down manually. It would be great to have auto scale pulp-workers pods based on the queue of waiting tasks
* go based docs: https://docs.pulpproject.org/pulp_operator/
* everything is namespaced in pulp-operator https://docs.pulpproject.org/pulp_operator/en/ansible/roles/pulp-api/#role-variables
* https://hackmd.io/SRZmd5L3SMWWyvvjHNE3rQ?view#Tuesday-November-8-User-day-2 talks about pulp-operator k8s deployments
* Chida's Installation Discussion and questions
* worker's heartbeat
* excessive mem usage during sync
* redis caching
*
# Oct 12th Updates
* What permissions does the operator need?
* https://github.com/pulp/pulp-operator/blob/main/bundle/manifests/pulp-operator.clusterserviceversion.yaml#L1122
* zero-downtime is a concern
* https://discourse.pulpproject.org/t/support-zero-downtime-updates/645
* https://app.element.io/?updated=1.11.5#/room/#pulp:matrix.org
* https://pulpproject.org/help/#chat-to-us
# Sept 28th Updates
* general updates
* Propose we shorten to a 30 minute, 2 week call
* [question from pulp operator team] Could you share some more details about the permissions that k8s operators typically require that are not acceptable for your environment?
* it downloads a lot of untrusted assets, but that could be gotten around by pointing to your own registry
* the permissions would need a more specific review
* they mostly use helm charts today
* uncomfortable with the API access to k8s itself because the "deployer" here is not the admin, they are general users
* we should be offering a helm chart
* Documented the dockerfile, see updates here https://docs.pulpproject.org/pulp_oci_images/
* Two upcoming goals (likely):
* combine the single container and the operator images to have one set of technology
* product an operations manual
* pulpcon coming up Nov 7-12: CFP is open until Monday, we'd welcome any talk about how Pulp3 is being used
* https://discourse.pulpproject.org/t/pulpcon-2022-call-for-proposals/590
* Some not yet posted talks:
* using the operator
* running pulp in containers without an operator
* operations guide for pulp
# Sept 14th Updates
* Using single container to pull RHEL content
* Deployed on AWS and using AWS RDS as the db backend for it
* Having some issues with running on k8s
* Enjoying to associate a repo version and distribution
* improvement from pulp2
* next step: to try to use the pulp_installer roles to built a container
* need identified: desire a dockerfile that we would share
### HCaaS open questions
* privacy - some content is deeply sensitive, questions about which systems are allowed to touch it, which systems are allowed to store it
* third party (potentially licensed) content, e.g. content from VMWare, Nvidia
* reliability - cybersecurity is critical, updates are critical, the infra must be available, SLAs are important
* quality checking (as described earlier) - where would custom, organization-defined quality checks fit into a hosted service model
# AI
* bmbouter to share dockerfile
* Investigate container privileges - running without root
* tiho to setup followup time to explore use cases and operational needs for a SaaS model
*