owned this note
owned this note
Published
Linked with GitHub
# Moving Namespaces into Pulp Ansible
## Problem statement
Collection namespaces need to be moved into pulp ansible so that pulp ansible can carry the namespaces API and provide namespace sync functionality.
## Implementation proposal
Data model:
- Store the existing fields from the galaxy_ng model as database fields
- Store the avatar (logo) as as an artifact and keep the hash of the image in the namespace metadata
- Note: the namespace serializer will handle creating an image link to the content app for the UI to consume
- Will the UI be able to access images on the content app with the redirect content guard?
- [mdellweg] Yes, there is a redirect machinery now that works in all cases. But the links will expire after ~2h.
- Use a hash of the metadata as the global uniqueness contraint
- Use name as the repository uniqueness contraint
```python
class Namespace(Content):
# fields on the existing model in galaxy_ng
name = models.CharField(max_length=64, blank=False)
company = models.CharField(max_length=64, blank=True)
email = models.CharField(max_length=256, blank=True)
description = models.CharField(max_length=256, blank=True)
resources = models.TextField(blank=True)
# this used to be modelled as a one to many
# [mdellweg] Gerrod: "Why not an ArrayField"
links = models.JSONField(defaul=[])
# [mdellweg] Can we call this `avatar_sha256`?
avatar_hash = models.TextField(blank=True)
# Hash of the values of all the fields mentioned above.
# Content uniqueness constraint.
# [mdellweg] Same here, `metadata_sha256` or just `sha256`?
# [mdellweg] We need a stable way to calculate this digest, and blank should be impossible. Maybe we can make this an auto-field.
# [Gerrod] Agree with trying to make this an auto-field.
metadata_hash = models.TextField(blank=True)
def calc_metadata_sha256(self):
...
```
Collection serializer:
- Add metadata hash to the collection/collection version serializers so that clients can tell when a namespace is updated and sync it.
```json
{
"href": "/api/galaxy/v3/plugin/ansible/content/published/collections/index/newswangerd/main_collection/",
"name": "main_collection",
"namespace": "newswangerd",
"namespace_metadata_hash": "xyz...."
[...]
}
```
Sync pipeline:
- For each collection version synced:
- Create a new namespace in the repository if one doesn't exist
- If the namepace does exist, compare the metadata hash on the remote and the local namespace that's associated with the collection version
- if a namespace exists locally with the same hash, associate it with the collection version
- if no namespace matches,create a new namespace object from the remote and associate it
- [mdellweg] this is exactly the logic the stages provide if the sha256 is the natural key.
- Namespaces should only be synced when a collectioning bellonging to it is synced.
- Ex: if have `foo.collection` and `bar.collection`, and am only syncing `bar.collection`, then only `bar` should synced, not `foo`.
- [mdellweg] Namespaces no longer needed (referenced) in a repository-version should be removed. (finalize can do this wi modify too)
## Notes 2022-10-05:
- Namespace metadata needs to be synced as content.
- Namespaces still need to get role assignments.
- Uniqueness contraints
- global
- put everything in the uniqueness contraint?
- +1 matthias
- no uniqueness constraint (might break sync)?
- digest of metadata?
- more efficient syncing (can serve the namespace digest on the v3 api)
- store metadata as jsonb in db and calculate hash
- store metadata as an artifact?
- hard to search content of artifact
- store readme and logo as artifact
- per repo
- namespace name
- Do logos and READMEs change?
- yes
- We can add their digests in the server answer and on the metadata content model.
- Only sync namespaces that have synced collections.
- Next up:
- Data model
## Discussion
### What are namepaces used for?
- Collection ownership.
- Users are granted permissions on a namespace that allow them to upload new collections with that given namespace. At the moment permissions are only tracked for groups, not users. So a user has to be part of a group that can upload collections for a given namespace.
- Denotes the organization or user that created the collection.
- Publisher metadata.
- Collection publishers can customize the information on their namespace. This includes:
- Adding links to external resources that the publisher may wish to promote
- Namespace descriptions
- A namespace "readme" markdown file
- Company name
- Logos/avatars
- Contact information
### How are namespaces currently implemented?
[Namespace model](https://github.com/ansible/galaxy_ng/blob/master/galaxy_ng/app/models/namespace.py#L57).
Namespaces are a global object (ie not part of any repository). They are linked to a `inbound-{namespace_name}` repo where user's are expected to upload collections for the namespace. When users upload collections they are required to upload them to an inbound repo that matches the namespace on the collection. The user's permissions are then checked against the namespace to verify that they have permission to upload the collection.
## Pulp Implementation
### Requirements
- Have to be able to maintain the collection ownership model in galaxy
- Have to be able to update and create namespace metadata via an api call.
- `PUT /v3/namespaces/<name>/` allows users to update all of the namespace information
- `POST /v3/namespaces/<name>/` allows users to create new namespaces
- Can't break backwards compatibility on the existing namespaces API.
### Possible options
#### Convert namespaces to Content
Convert the namespace object to a Content type. This will allow namespaces to be synced from remote sources, but will greatly complicate the proces of updating namespace metadata and moving collections between repositories.
This is technically possible because even though namespaces don't belong to any repository right now, they are part of the V3 api which is currently scoped to a specific repo, so they can be moved to a repository without breaking changes on the public API.
Pros:
- Namespace sync
- Potential for multitenancy. If namespace ownership is dictated on a per repository basis, then you can have distinct namespaces with distinct owners in each repository.
Cons:
- Ownership will be very difficult to track. For someone to upload a collection to a repository there will have be a namespace object in that repo that the user has permissions on.
#### Leave namespaces as is
Move the data model and implement the current galaxy_ng data model as it is.
Pros:
- Easy
Cons:
- Namespace sync will be impossible
#### Hybrid approach
Maintain collection ownership information on a global namespaces object (like it is now) but treat namespace metadata as content. Ownership can't be synced, so it's fine to manage this locally.
Pros:
- Namespace ownership can be tracked globally
- Metadata (logos, description etc) can be synced and tracked separately for each repo.
- Collections with the "red_hat" namespace can be synced from community galaxy and cloud automation hub and the upstream vs downstream branding can be maintained separately
Cons:
- Moving collections between repositories will still be tricky. What do you do if:
- a collection moves into a repo that has a different namespace object?
- a collection moves into a repo with no matching namespace?
### Considerations
- Namespaces are deeply ingrained with permissioning in the galaxy universe. Should this come with a full RBAC implementation in pulp ansible?
- The current ownership model will likely need to change with repo management in galaxy_ng. We may not want users to be able to upload collections into any repository if they nave namepace upload rights.