Operating Pulp in Production

## Operating Pulp in Production Brian Bouterse --- Q: Why am I the worst person to give this talk? --- Q: Why am I the worst person to give this talk? A: I don't run Pulp in production! --- ## Agenda * Lessons learned * Community of Practice for Pulp in production * Zero-Downtime working group * Performance and Scale Testing * Containers for Production * Topics totally unaddressed --- ## Lessons Learned Info gathered from users I've talked to. --- ## Deploying Pulp Keep buildable assets in a repo, commit changes trigger rebuilds in a Jenkins pipeline. Pushes the image to quay, and then other tooling deploys those manifests to openshift. This happens for 3 images: * the application itself * an nginx reverse proxy * an image for linting check that is process isolated to run untrusted code --- Right Sizing Pulp --- ### Install #1 * 250,000 requests/day * Openshift based on AWS * Uses RDS Database * No Redis Caching --- ### Install #1 ~250,000 requests/day nginx: - 3 pods - CPU request/limit: 100m/200m - Memory request/limit: 128Mi/128Mi --- ### Install #1 ~250,000 requests/day api: - 3 pods - CPU request/limit: 1/1 - Memory request/limit: 2048Mi/2048Mi --- ### Install #1 ~250,000 requests/day content-app: - 3 pods - CPU request/limit: 200m/1 - Memory request/limit: 1536Mi/1536Mi --- ### Install #1 ~250,000 requests/day worker: - 6 pods - CPU request/limit: 200m/500m - Memory request/limit: 256Mi/512Mi --- ### Install #1 ~250,000 requests/day database: - 8 vCPU - 32 GB Memory - 100 GB disk --- ### Install #2 * 6,500,000 requests/day * In Openshift environment (not AWS I think) * Uses RDS Database * No Redis Caching * Older install (has resource manager) --- ### Install #2 ~6,500,000 requests/day nginx: - 2 pods - CPU request/limit: 100m/200m - Memory request/limit: 64Mi/128Mi --- ### Install #2 ~6,500,000 requests/day api: - 32 pods - CPU request/limit: 250m/500m - Memory request/limit: 1Gi/1536Mi --- ### Install #2 ~6,500,000 requests/day resource-manager: - 1 pod - CPU request/limit: 250m/500m - Memory request/limit: 256Mi/512Mi --- ### Install #2 ~6,500,000 requests/day worker: - 2 pods - CPU request/limit: 250m/1 - Memory request/limit: 256Mi/512Mi --- ### Install #2 ~6,500,000 requests/day database: - 16 vCPU - 64 GB Memory - 100 GB disk --- ## Story * User had set the retained_repo version to 1. * Bug! Accidentally created a new repo version with 0 content. * Effect: deleted all content from important repositories! --- ## Lesson Learned Having more version history is a good safeguard. The recovery was able to be done by having other repository copies. --- ## Community of Practice for Pulp in production --- A community of practice (CoP) is a group of people who "share a concern or a passion for something they do and learn how to do it better as they interact regularly". - [Etienne and Beverly Wenger-Trayner](https://www.wenger-trayner.com/introduction-to-communities-of-practice/) --- Designed to be async and low-effort for participation. https://discourse.pulpproject.org/t/community-of-practice-running-pulp-in-production/683 --- ## Containers for Production Claim: Pulp is hard to deploy Goal: Focus on container based installs only --- ## Containers for Production * Feedback on Operator in Production * Feedback on container usage outside k8s * upgrading from Ansible Installer -> Containers --- ## Topics Totally Unaddressed * Monitoring * Configuring external logging * Documenting how to * configure with AWS RDS DBs * deploy Pulp on EKS * configure with Amazon MemoryDB for Redis * Scaling Up/Down * Backup / Restore * Multi-Geography Installations