PulpCon 2020 : Other Topics

## Other Topics As PulpKhan progresses we identify Things that need more time/discussion than we get in 50 minutes. List them here, so we have something to pick from to fill our open slot * Lockless design discussion * starting from https://hackmd.io/gQaEwur4QiauQrUu7NiVVw * Elimination of Resource Manager * Experience from Scale Testing: Sending everything thru the resource-manager is **not** scalable * It's a deployment annoyance * It's a single-point-of-failure * Why do we need it? * It's a/the lock-router * Example: on sync, you need to lock Remote-A and Repository-A * Routing-to-workers needs to be efficient * RM currently handles that algorithm * Proposal: * tasks are created in the DB * routing/locking moved into workers * ie, worker picks up their own work * Will this let us remove RQ/Redis from Pulp3 entirely? * would greatly improve our HA narrative * Algorithm discussion * if worker 'has' lock/locks, and is ready for more work, should look for tasks that 'need' these locks * deadly-embrace is Fun * **Action**: [bmbouter] raise discussion on pulp-dev@ * Elimination of Orphan Cleanup * 'most serious' current problem * singleton, waits till all other tasks have halted before running, locks All The Things and then runs Forever * we can (and have) improved performance, but it still takes Way Too Long * reports of 15-hour-runs (!!!) * That was before the improvement FWIW. However it is likely still not great. * How can we mitigate? * periodic, parallel task, OR * ref-counting (remove when last-pinning-repo unpins) * impact/downside: * today, plugins are guaranteed that content is never removed from underneath you * code would have to elegantly-handle Unexpected 404s * maybe "let it fail and just rerun" is an OK Thing to document * what about sync/stages-code? * will need to be written to handle the above 404s (ie, user shouldn't have to re-run sync repeatedly) * teach orphan-cleanup to not delete 'recently orphaned' content? Focus on Older Orphans? (can we do that? does it imply ref-counting?) * we only know "recently created", not "recently orphaned" * Demotion of content to "on_demand", if an orphaned content unit has a RemoteArtifact, we could delete the Artifact without deleting the content unit * Could be an interesting idea for the periodic amortized orphan cleanup strategy * **Don't forget about Upload!!** * **Action**: start this discussion on pulp-dev@ [bmbouter] * get katello's feedback (because they have 'interesting' workflows) * merge pulplift into pulp_installer * come up with an actionable plan * We seem to be in agreement for "move pulplift features into pulp_installer" * timeline: #soon * needs a redmine ticket * docs will prob be the largest Thing * downside: CI - Travis queue * possible solutions: * separate org for installers? * do not run pulplift CI for PRs? * Docker based vagrant box * vagrant can use libvirt or virtbox as the backend * another option is Docker images * motivation: more opportunities for ppl to contribute * provisions faster * less/lower resource-usage on hosting dev-machine * Q: is this really Docker-only? * https://www.vagrantup.com/docs/providers/docker * https://www.vagrantup.com/docs/provisioning/podman * Need to approach forklift-maintainers RE providing docker images * Would need to be 'in addition to' current VM-based envs * wouldn't need to support every-env this way (just fedora/debian) * hardware resources are Very Expensive in many parts of the world * priority-ordering : maybe after pulplift-into-pulp_installer * time/resources are always the problem * Does this change the developer's workflow? * prob no change? * I/O performance may be...Suboptimal * systemd-in-container-on-systemd-host can **also** be Suboptimal * wibbit's env uses LXC containers, LVM thin-volumes * "exceedingly convenient, speeds up our development substantially" * Ubuntu support in the installer? * Appears to have been silently dropped in the past. * Web hosting market share is 37.8%. Some orgs are Ubuntu shops. * We support Debian 10, so Ubuntu 20.04 (released 9 months later) should be quick to implement. * pulp_installer CI performance impact expected to be small (GHA determined to have free RAM) * As it stands, RPM support _can't_ work on Debian/Ubuntu, since all the necessary stuff is missing there. There is someone working on packaging DNF stack and `createrepo_c` for Debian, but no appreciable progress has occurred so far. * Well, that's not entirely true, the binary python package for createrepo_c could work in theory, it's just that the last time I tried it there was a weird bug that I've spent zero time investigating because it's not a priority * But maybe libmodulemd is still a problem? * Caching (related to downtime discussion, RHUI needs) * We don't have much testing of Pulp under high content consumption load, because this is difficult to simulate. However, this is likely the single most important aspect of Pulp performance. * Imagine 15,000 clients simultaneously making requests for 50 packages each - or 250 machines performing a kickstart install each requesting >1000 packages. * Each request for a file involves *at minimum* one database query. That is a conservative guess, I think the actual number is more like 3-6 (guard checks, basedistribution + publishedartifact lookup or contentartifact lookup). This expands to potentially *millions* of small database queries. * Same thing happens with metadata downloads - clients refreshing their metadata will load Pulp as well even w/ no packages downloaded. * I have low confidence that this will scale to many thousands of clients. * RH * **ACTION**: dalley/ggainey to try to find tech from The Before Times for simulating large numbers of clients * Releasing y-releases with or without a delay to allow for manual testing ###### tags: `PulpCon 2020`