owned this note
owned this note
Published
Linked with GitHub
---
title: Multi Geography Pulp Architectures
tags: Talk, Architecture, Pulp, multi geo, pulpcon 2023
description: View the slide with "Slide Mode".
---
## Multi Geography Pulp Architectures
* Brian Bouterse - Pulp Product Owner for Services
* Ina Panova - Principal Software Engineer
* slides: https://hackmd.io/@pulp/SyZewjLzp
---
## What Users Want
Deploy Content in multiple Geographies
content = RPMs, OSTree, Maven, Gems, ISOs, etc
---
## Why Do They Want It?
* Want content close to where it's needed
* Faster -- lower latency access to data
* Cheaper -- less cloud egress costs, WAN link costs
* High Availability
---
## Architectures That Don't Work Well
* Have pulpcore-content span the WAN connections
* Have Separate N Pulps
---
## pulpcore-content spanning WAN connections
```graphviz
digraph G {
// Set default node properties
node [shape=box, style=filled, fillcolor=white];
// Define three network segments
subgraph cluster1 {
label="Network 1";
labelloc=t;
Pulp1 [label="pulpcore-content", shape=box, width=1, height=1];
Client1_1 [label="Client 1", shape=ellipse];
Client1_2 [label="Client 2", shape=ellipse];
Pulp1 -> Client1_1;
Pulp1 -> Client1_2;
}
subgraph cluster2 {
label="Network 2";
labelloc=t;
Pulp2 [label="pulpcore-content", shape=box, width=1, height=1];
Client2_1 [label="Client 1", shape=ellipse];
Client2_2 [label="Client 2", shape=ellipse];
Pulp2 -> Client2_1;
Pulp2 -> Client2_2;
}
subgraph cluster3 {
label="Network 3";
labelloc=t;
Pulp3 [label="pulpcore-content", shape=box, width=1, height=1];
Client3_1 [label="Client 1", shape=ellipse];
Client3_2 [label="Client 2", shape=ellipse];
Pulp3 -> Client3_1;
Pulp3 -> Client3_2;
}
// Database and FileSystem nodes with improved shapes
Database [label="Database", shape=cylinder, style=filled, fillcolor=lightgray];
FileSystem [label="FileSystem", shape=box3d, style=filled, fillcolor=lightgray];
// Swap the order of the connections between Pulp and Database and FileSystem
Database -> Pulp1;
FileSystem -> Pulp1;
Database -> Pulp2;
FileSystem -> Pulp2;
Database -> Pulp3;
FileSystem -> Pulp3;
}
```
### Problems
* DB connections can't span the WAN
* Probably won't even boot
---
## Have Separate N Pulps
```graphviz
digraph G {
// Set default node properties
node [shape=box, style=filled, fillcolor=white];
// Define three network segments
subgraph cluster1 {
label="Geo 1";
labelloc=t;
Pulp1 [label="Pulp 1", shape=box, width=1, height=1];
Client1_1 [label="Client 1", shape=ellipse];
Client1_2 [label="Client 2", shape=ellipse];
Pulp1 -> Client1_1;
Pulp1 -> Client1_2;
}
subgraph cluster2 {
label="Geo 2";
labelloc=t;
Pulp2 [label="Pulp 2", shape=box, width=1, height=1];
Client2_1 [label="Client 1", shape=ellipse];
Client2_2 [label="Client 2", shape=ellipse];
Pulp2 -> Client2_1;
Pulp2 -> Client2_2;
}
subgraph cluster3 {
label="Geo 3";
labelloc=t;
Pulp3 [label="Pulp 3", shape=box, width=1, height=1];
Client3_1 [label="Client 1", shape=ellipse];
Client3_2 [label="Client 2", shape=ellipse];
Pulp3 -> Client3_1;
Pulp3 -> Client3_2;
}
// Change the names and update shapes for Database and FileSystem nodes
"RPM repo 1" [label="RPM repo 1", shape=cylinder, style=filled, fillcolor=lightgray];
"RPM repo 2" [label="RPM repo 2", shape=cylinder, style=filled, fillcolor=lightgray];
// Swap the order of the connections between Pulp and RPM repo
"RPM repo 1" -> Pulp1;
"RPM repo 2" -> Pulp1;
"RPM repo 1" -> Pulp2;
"RPM repo 2" -> Pulp2;
"RPM repo 1" -> Pulp3;
"RPM repo 2" -> Pulp3;
}
```
### Problems
* Content can be different
* Egress costs from original CDN, rate limiting
* Fully Manage N-Pulps
---
## What to do instead
* Organize content on one Pulp
* Sync to a Pulp that is running in each Geo
---
## Architecture
```graphviz
digraph G {
// Set default node properties
// Set default node properties
node [shape=box, style=filled, fillcolor=white];
// Define three network segments
subgraph cluster1 {
label="Geo 1";
labelloc=t;
Pulp1 [label="Pulp 1", shape=box, width=1, height=1];
Client1_1 [label="Client 1", shape=ellipse];
Client1_2 [label="Client 2", shape=ellipse];
Pulp1 -> Client1_1;
Pulp1 -> Client1_2;
}
subgraph cluster2 {
label="Geo 2";
labelloc=t;
Pulp2 [label="Pulp 2", shape=box, width=1, height=1];
Client2_1 [label="Client 1", shape=ellipse];
Client2_2 [label="Client 2", shape=ellipse];
Pulp2 -> Client2_1;
Pulp2 -> Client2_2;
}
subgraph cluster3 {
label="Geo 3";
labelloc=t;
Pulp3 [label="Pulp 3", shape=box, width=1, height=1];
Client3_1 [label="Client 1", shape=ellipse];
Client3_2 [label="Client 2", shape=ellipse];
Pulp3 -> Client3_1;
Pulp3 -> Client3_2;
}
// Add a Pulp node in between RPM repos and Pulp 1, Pulp 2, and Pulp 3
Pulp [label="Pulp", shape=box, width=1, height=1];
"RPM repo 1" [label="RPM repo 1", shape=cylinder, style=filled, fillcolor=lightgray];
"RPM repo 2" [label="RPM repo 2", shape=cylinder, style=filled, fillcolor=lightgray];
// Connect the RPM repos to the Pulp node
"RPM repo 1" -> Pulp;
"RPM repo 2" -> Pulp;
// Connect the Pulp node to Pulp 1, Pulp 2, and Pulp 3
Pulp -> Pulp1;
Pulp -> Pulp2;
Pulp -> Pulp3;
}
```
* Syncing can be on-demand or immediate per repo
---
## Is this easy?
Yes!
---
## Replicate Repos and Distributions (urls)
* pulpcore - 3.23
* pulp-rpm - 3.20
* pulp-file - 1.14
---
## How to Use
* It's a pull model
* "upstream Pulp" - The Pulp being synced from
* "downstream Pulp" - The Pulp performing syncing
* Configure downstream Pulp to sync content from an "upstream Pulp"
---
## How to Use
* Create an [upstream Pulp](https://docs.pulpproject.org/pulpcore/restapi.html#tag/Upstream-Pulps/operation/upstream_pulps_create)
* POST /pulp/api/v3/upstream-pulps/
* upstream url
* cert or basic auth
* Trigger the replication task
---
## What it does
* Creates Repos, Remotes, and Distributions
* All names use upstream Distribution name
* Upstream and Downstream Distro urls are same
* replicates everything by default
---
## What it doesn't do
* Replicate content protection on the downstream distros
* Only supports immediate sync, but we can add on_demand
---
## Use Labels for More Control
* Upstream Pulp takes the `pulp_label_select`
* If set, only replicate Distributions on upstream that have that label
* Allow subset of repos/distros to be replicated
* Allow different Pulps to receive different repos/distros
---
## Demo
See demo [here](https://youtu.be/_VMnA_3ZbLU).
---
## Can I use rsync instead?
Idea: rsync into the geo, then use a webserver to serve
Yes, but you need to know some things
---
## Rsync tricky thing 1
* Not atomic -- clients can end up broken during sync
* The metadata syncs first, but the packages aren't yet available
---
## Rsync tricky thing 2
* No opportunity for on-demand repos
---
## Rsync tricky thing 3
* Not all content types support being served with just a webserver
* Docker and Ansible for example both need queryable APIs
---
# Thanks!