Talk slides template

## Multi Geography Pulp Architectures * Brian Bouterse - Pulp Product Owner for Services * Ina Panova - Principal Software Engineer * slides: https://hackmd.io/@pulp/SyZewjLzp --- ## What Users Want Deploy Content in multiple Geographies content = RPMs, OSTree, Maven, Gems, ISOs, etc --- ## Why Do They Want It? * Want content close to where it's needed * Faster -- lower latency access to data * Cheaper -- less cloud egress costs, WAN link costs * High Availability --- ## Architectures That Don't Work Well * Have pulpcore-content span the WAN connections * Have Separate N Pulps --- ## pulpcore-content spanning WAN connections ```graphviz digraph G { // Set default node properties node [shape=box, style=filled, fillcolor=white]; // Define three network segments subgraph cluster1 { label="Network 1"; labelloc=t; Pulp1 [label="pulpcore-content", shape=box, width=1, height=1]; Client1_1 [label="Client 1", shape=ellipse]; Client1_2 [label="Client 2", shape=ellipse]; Pulp1 -> Client1_1; Pulp1 -> Client1_2; } subgraph cluster2 { label="Network 2"; labelloc=t; Pulp2 [label="pulpcore-content", shape=box, width=1, height=1]; Client2_1 [label="Client 1", shape=ellipse]; Client2_2 [label="Client 2", shape=ellipse]; Pulp2 -> Client2_1; Pulp2 -> Client2_2; } subgraph cluster3 { label="Network 3"; labelloc=t; Pulp3 [label="pulpcore-content", shape=box, width=1, height=1]; Client3_1 [label="Client 1", shape=ellipse]; Client3_2 [label="Client 2", shape=ellipse]; Pulp3 -> Client3_1; Pulp3 -> Client3_2; } // Database and FileSystem nodes with improved shapes Database [label="Database", shape=cylinder, style=filled, fillcolor=lightgray]; FileSystem [label="FileSystem", shape=box3d, style=filled, fillcolor=lightgray]; // Swap the order of the connections between Pulp and Database and FileSystem Database -> Pulp1; FileSystem -> Pulp1; Database -> Pulp2; FileSystem -> Pulp2; Database -> Pulp3; FileSystem -> Pulp3; } ``` ### Problems * DB connections can't span the WAN * Probably won't even boot --- ## Have Separate N Pulps ```graphviz digraph G { // Set default node properties node [shape=box, style=filled, fillcolor=white]; // Define three network segments subgraph cluster1 { label="Geo 1"; labelloc=t; Pulp1 [label="Pulp 1", shape=box, width=1, height=1]; Client1_1 [label="Client 1", shape=ellipse]; Client1_2 [label="Client 2", shape=ellipse]; Pulp1 -> Client1_1; Pulp1 -> Client1_2; } subgraph cluster2 { label="Geo 2"; labelloc=t; Pulp2 [label="Pulp 2", shape=box, width=1, height=1]; Client2_1 [label="Client 1", shape=ellipse]; Client2_2 [label="Client 2", shape=ellipse]; Pulp2 -> Client2_1; Pulp2 -> Client2_2; } subgraph cluster3 { label="Geo 3"; labelloc=t; Pulp3 [label="Pulp 3", shape=box, width=1, height=1]; Client3_1 [label="Client 1", shape=ellipse]; Client3_2 [label="Client 2", shape=ellipse]; Pulp3 -> Client3_1; Pulp3 -> Client3_2; } // Change the names and update shapes for Database and FileSystem nodes "RPM repo 1" [label="RPM repo 1", shape=cylinder, style=filled, fillcolor=lightgray]; "RPM repo 2" [label="RPM repo 2", shape=cylinder, style=filled, fillcolor=lightgray]; // Swap the order of the connections between Pulp and RPM repo "RPM repo 1" -> Pulp1; "RPM repo 2" -> Pulp1; "RPM repo 1" -> Pulp2; "RPM repo 2" -> Pulp2; "RPM repo 1" -> Pulp3; "RPM repo 2" -> Pulp3; } ``` ### Problems * Content can be different * Egress costs from original CDN, rate limiting * Fully Manage N-Pulps --- ## What to do instead * Organize content on one Pulp * Sync to a Pulp that is running in each Geo --- ## Architecture ```graphviz digraph G { // Set default node properties // Set default node properties node [shape=box, style=filled, fillcolor=white]; // Define three network segments subgraph cluster1 { label="Geo 1"; labelloc=t; Pulp1 [label="Pulp 1", shape=box, width=1, height=1]; Client1_1 [label="Client 1", shape=ellipse]; Client1_2 [label="Client 2", shape=ellipse]; Pulp1 -> Client1_1; Pulp1 -> Client1_2; } subgraph cluster2 { label="Geo 2"; labelloc=t; Pulp2 [label="Pulp 2", shape=box, width=1, height=1]; Client2_1 [label="Client 1", shape=ellipse]; Client2_2 [label="Client 2", shape=ellipse]; Pulp2 -> Client2_1; Pulp2 -> Client2_2; } subgraph cluster3 { label="Geo 3"; labelloc=t; Pulp3 [label="Pulp 3", shape=box, width=1, height=1]; Client3_1 [label="Client 1", shape=ellipse]; Client3_2 [label="Client 2", shape=ellipse]; Pulp3 -> Client3_1; Pulp3 -> Client3_2; } // Add a Pulp node in between RPM repos and Pulp 1, Pulp 2, and Pulp 3 Pulp [label="Pulp", shape=box, width=1, height=1]; "RPM repo 1" [label="RPM repo 1", shape=cylinder, style=filled, fillcolor=lightgray]; "RPM repo 2" [label="RPM repo 2", shape=cylinder, style=filled, fillcolor=lightgray]; // Connect the RPM repos to the Pulp node "RPM repo 1" -> Pulp; "RPM repo 2" -> Pulp; // Connect the Pulp node to Pulp 1, Pulp 2, and Pulp 3 Pulp -> Pulp1; Pulp -> Pulp2; Pulp -> Pulp3; } ``` * Syncing can be on-demand or immediate per repo --- ## Is this easy? Yes! --- ## Replicate Repos and Distributions (urls) * pulpcore - 3.23 * pulp-rpm - 3.20 * pulp-file - 1.14 --- ## How to Use * It's a pull model * "upstream Pulp" - The Pulp being synced from * "downstream Pulp" - The Pulp performing syncing * Configure downstream Pulp to sync content from an "upstream Pulp" --- ## How to Use * Create an [upstream Pulp](https://docs.pulpproject.org/pulpcore/restapi.html#tag/Upstream-Pulps/operation/upstream_pulps_create) * POST /pulp/api/v3/upstream-pulps/ * upstream url * cert or basic auth * Trigger the replication task --- ## What it does * Creates Repos, Remotes, and Distributions * All names use upstream Distribution name * Upstream and Downstream Distro urls are same * replicates everything by default --- ## What it doesn't do * Replicate content protection on the downstream distros * Only supports immediate sync, but we can add on_demand --- ## Use Labels for More Control * Upstream Pulp takes the `pulp_label_select` * If set, only replicate Distributions on upstream that have that label * Allow subset of repos/distros to be replicated * Allow different Pulps to receive different repos/distros --- ## Demo See demo [here](https://youtu.be/_VMnA_3ZbLU). --- ## Can I use rsync instead? Idea: rsync into the geo, then use a webserver to serve Yes, but you need to know some things --- ## Rsync tricky thing 1 * Not atomic -- clients can end up broken during sync * The metadata syncs first, but the packages aren't yet available --- ## Rsync tricky thing 2 * No opportunity for on-demand repos --- ## Rsync tricky thing 3 * Not all content types support being served with just a webserver * Docker and Ansible for example both need queryable APIs --- # Thanks!