owned this note
owned this note
Published
Linked with GitHub
## Other Topics
As PulpKhan progresses we identify Things that need more time/discussion than we get in 50 minutes. List them here, so we have something to pick from to fill our open slot
* Lockless design discussion
* starting from https://hackmd.io/gQaEwur4QiauQrUu7NiVVw
* Elimination of Resource Manager
* Experience from Scale Testing: Sending everything thru the resource-manager is **not** scalable
* It's a deployment annoyance
* It's a single-point-of-failure
* Why do we need it?
* It's a/the lock-router
* Example: on sync, you need to lock Remote-A and Repository-A
* Routing-to-workers needs to be efficient
* RM currently handles that algorithm
* Proposal:
* tasks are created in the DB
* routing/locking moved into workers
* ie, worker picks up their own work
* Will this let us remove RQ/Redis from Pulp3 entirely?
* would greatly improve our HA narrative
* Algorithm discussion
* if worker 'has' lock/locks, and is ready for more work, should look for tasks that 'need' these locks
* deadly-embrace is Fun
* **Action**: [bmbouter] raise discussion on pulp-dev@
* Elimination of Orphan Cleanup
* 'most serious' current problem
* singleton, waits till all other tasks have halted before running, locks All The Things and then runs Forever
* we can (and have) improved performance, but it still takes Way Too Long
* reports of 15-hour-runs (!!!)
* That was before the improvement FWIW. However it is likely still not great.
* How can we mitigate?
* periodic, parallel task, OR
* ref-counting (remove when last-pinning-repo unpins)
* impact/downside:
* today, plugins are guaranteed that content is never removed from underneath you
* code would have to elegantly-handle Unexpected 404s
* maybe "let it fail and just rerun" is an OK Thing to document
* what about sync/stages-code?
* will need to be written to handle the above 404s (ie, user shouldn't have to re-run sync repeatedly)
* teach orphan-cleanup to not delete 'recently orphaned' content? Focus on Older Orphans? (can we do that? does it imply ref-counting?)
* we only know "recently created", not "recently orphaned"
* Demotion of content to "on_demand", if an orphaned content unit has a RemoteArtifact, we could delete the Artifact without deleting the content unit
* Could be an interesting idea for the periodic amortized orphan cleanup strategy
* **Don't forget about Upload!!**
* **Action**: start this discussion on pulp-dev@ [bmbouter]
* get katello's feedback (because they have 'interesting' workflows)
* merge pulplift into pulp_installer
* come up with an actionable plan
* We seem to be in agreement for "move pulplift features into pulp_installer"
* timeline: #soon
* needs a redmine ticket
* docs will prob be the largest Thing
* downside: CI - Travis queue
* possible solutions:
* separate org for installers?
* do not run pulplift CI for PRs?
* Docker based vagrant box
* vagrant can use libvirt or virtbox as the backend
* another option is Docker images
* motivation: more opportunities for ppl to contribute
* provisions faster
* less/lower resource-usage on hosting dev-machine
* Q: is this really Docker-only?
* https://www.vagrantup.com/docs/providers/docker
* https://www.vagrantup.com/docs/provisioning/podman
* Need to approach forklift-maintainers RE providing docker images
* Would need to be 'in addition to' current VM-based envs
* wouldn't need to support every-env this way (just fedora/debian)
* hardware resources are Very Expensive in many parts of the world
* priority-ordering : maybe after pulplift-into-pulp_installer
* time/resources are always the problem
* Does this change the developer's workflow?
* prob no change?
* I/O performance may be...Suboptimal
* systemd-in-container-on-systemd-host can **also** be Suboptimal
* wibbit's env uses LXC containers, LVM thin-volumes
* "exceedingly convenient, speeds up our development substantially"
* Ubuntu support in the installer?
* Appears to have been silently dropped in the past.
* Web hosting market share is 37.8%. Some orgs are Ubuntu shops.
* We support Debian 10, so Ubuntu 20.04 (released 9 months later) should be quick to implement.
* pulp_installer CI performance impact expected to be small (GHA determined to have free RAM)
* As it stands, RPM support _can't_ work on Debian/Ubuntu, since all the necessary stuff is missing there. There is someone working on packaging DNF stack and `createrepo_c` for Debian, but no appreciable progress has occurred so far.
* Well, that's not entirely true, the binary python package for createrepo_c could work in theory, it's just that the last time I tried it there was a weird bug that I've spent zero time investigating because it's not a priority
* But maybe libmodulemd is still a problem?
* Caching (related to downtime discussion, RHUI needs)
* We don't have much testing of Pulp under high content consumption load, because this is difficult to simulate. However, this is likely the single most important aspect of Pulp performance.
* Imagine 15,000 clients simultaneously making requests for 50 packages each - or 250 machines performing a kickstart install each requesting >1000 packages.
* Each request for a file involves *at minimum* one database query. That is a conservative guess, I think the actual number is more like 3-6 (guard checks, basedistribution + publishedartifact lookup or contentartifact lookup). This expands to potentially *millions* of small database queries.
* Same thing happens with metadata downloads - clients refreshing their metadata will load Pulp as well even w/ no packages downloaded.
* I have low confidence that this will scale to many thousands of clients.
* RH
* **ACTION**: dalley/ggainey to try to find tech from The Before Times for simulating large numbers of clients
* Releasing y-releases with or without a delay to allow for manual testing
###### tags: `PulpCon 2020`