owned this note
owned this note
Published
Linked with GitHub
# Lockless Pulp
# General Problem Statements
- Resource manager is a problem.
- It's a bottleneck. Every task goes through the resource manager.
- When tasks die, inconsistent state rq vs Pulp
- Pulp is inefficient. It's always FIFO when that may not be optimal.
- Work has to wait if there's a reservation ahead of it waiting for another resource
- While waiting, the reserved resources of a task may be changed or deleted by other tasks earlier in the queue
- Orphan cleanup blocks all work
- Orphan cleanup might be handled separately by assuring that any resource the is to be clean up (currently content units and artifacts) is owned by at least one thing (RepositoryVersion, Task or User) at any time until not needed anymore
# Why do we have mutable resources?
Some things are immutable, e.g. content, but other things do, e.g. a Repository's data, e.g. it's name.
## Four Quadrants
* Immutable-and-shared
* Immutable-and-owned
* Mutable-and-solo-shared
* Mutable-and-solo-owned
Pulp lives in the mutable-and-shared quadrant right now.
* Mutable data creates write-write race conditions
* Users have a first-come-first-serve (FCFS) expectation, e.g. a sync-then-publish
# Why do we use locking today?
* We use it to solve the base-path problem for Distributions
* It provides the FCFS guarantee of work w.r.t a specific resource, e.g. Respotiory
* Orphan cleanup is a singleton so it locks content units that are in use by other tasks when deleting
* Deletion of resources is synchronized by locking
* Updating of resources is synchronized by locking
* Creation of RepositoryVersions is serialized by locking on the Repository
# Solution
"getting out of the synchronization quadrant with immutability"
Locks or bottlenecks are needed to prevent usage of resources by one process, while being changed by another.
If all resources were immutable, they could be used by an unlimited number of processes simultaneously without the need for any synchronization primitives.
(Some kind of locking will still happen on the database level, but the abstract view of pulp code will not be concerned with it.)
Pros:
- no need for resource manager
- all (remaining) services are scalable
Cons:
- Harder to design
- Needs a new data model
## Exposed immutability
*All* resources are immutable
Pros:
- relatively easy to implement; no need to redesign the database
- blockless; no waiting on resources, ever (the user has to wait...)
- another user cannot modify ad resource you are about to use
Cons:
- all burden pushed to the user
- changing a resource requires replacing it
- unable to change parameters behind a "name" (natural key)
## Futures (delayed immutability)
Resources are created as futures and resolved by tasks.
Once resolved, resources are immutable.
Resources start in the unshared mutable corner and move to the shared immutable corner.
Futures can readily be used to create new resources while not yet resolved.
Pros:
- accounts for actions like sync, publish to take time
- futures form a DAG -> no possibility for dead locks
- still no waiting on finished resources
Cons:
- waiting on resources; blocking
- futures can fail while waiting on failing dependent futures
## Copy-on-write with lookup table
Resources are immutable, and referenced by a key lookup table.
Changing a resource means creating a modified copy and changing the reference in the lookup table.
Tasks reference the actual resources.
Pros:
- users can "change" resources as they are used to
- blockless; no waiting on resources, ever (the user has to wait...)
- tasks can never fail on missing resources (being deleted in the meantime)
Cons:
- extra table join on lookup
- impossible to retrieve the natural key
## Notes
* Repository-version-creation is an existing example of "delayed immutability" in Pulp3
* Repository, Remote, Distribution would work well with copy-on-write
* Currently both Pulp3 and RQ 'know' about tasks - that they can get out of sync is part of the existing problem
* tasks can just live in the queue, since the act of existing means they can be run immediately
* How does 'ordering' happen in the lockless world?
* copy-on-write is atomic, so doesn't need ordering
* how would version-creation work in this world?
* katello makes 3 new versions and only cares about the final result, for example
* need to have an "i'm being updated now" on repo, and next-sync can't happen if that's on
* how do we handle order-dependency **now**?
* how do we handle updating an entity?
* content is already immutable
* repositories, remotes, distributions - not so much
* the question is, 'how do we close all the windows' if we go this route
* combining the last two versions (futures and copy-on-write)
* "once a real object is not referenced any more by a name ort a tak it can be deleted"
* how does that work?
* garbage collection?
* may not need a Single Big Lock
* what's the user-experience when:
* change remote-url
* sync
* multiple views on the repository in the multi-user case - what does it look like?
* the possible confusion-cases are the same ones we already have (we think)
* Can we look hard at the real-world problems we're trying to address and categorize, and see if there is a solution to the Big problem(s) that don't require this level of redesign
* orphan-cleanup is a Big Problem
* resource-manager lock is medium-size (or possibly Much Bigger given user experience)
* 'small' issues
* There are things that need to be solved, in **Pulp3**, regardless of how much 'better' they can/could be solved in a Pulp4 World
* need things like better-error-handling between check-and-use of, for example, content (or at least some way to solve this issue)
* How does this interact with releases and Pulp3/Pulp4 and migrations? Can we fix some of today's issues, now, **without** a major internals-redesign? Stay tuned for more discussions!! :)
* concerns about etcd
* we prob need to spin up a SIG (or something?) around this topic, what's the best way to make this discussion happen?
* Action Item: we need to have some open deep-dives on technology issues - need someone to hit up the mailing-list to get interested parties and suggestions, and open this discussion up to more input
###### tags: `PulpCon 2020`