owned this note
owned this note
Published
Linked with GitHub
# Pulp Resource Labels
Background/requirements: https://pulp.plan.io/issues/7127
## API Design
### Filtering
Labels can be filtered by passing a urlencoded string to a `label_selector` parameter.
Some examples based on [the kubernetes documentation](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#api):
* `?label_selector=environment%3Dproduction,tier%3Dfrontend`
* Evaluates to `environment=production,tier=frontend`
* `?label_selector=environment+in+%28production%2Cqa%29%2Ctier+in+%28frontend%29`
* Evaluates to `environment in (production,qa),tier in (frontend)`
Note: Ansible Galaxy and RHUI have agreed that for a first pass, we could just support a subset of operators (ie `=` and `!=`).
#### LabelSelectFilter
`LabelSelectFilter` would be a `django_filter.Filter` that parses the label_select parameter and then filters the queryset. It can be applied to a Queryset of any model with labels.
Note: for an example of a complex `Filter`, [see the `RepositoryVersionFilter`](https://git.io/JIJif).
### LabelSerializer
Create a new `LabelSerializer` that can be nested into other model serializers as a field (much like the `CreatedResourceSerializer`). This serializer should be both readable and writable and should enable the following API calls.
#### Reading
```
# GET /pulp/api/v3/repositories/file/file/
{
...
"labels": {"foo": "bar", "foo2": "baz"},
...
}
```
#### Setting/Updating
```
# POST /pulp/api/v3/repositories/file/file/ name=test labels='[{"foo": "bar"}]'
{
...
"labels": {"foo": "bar"},
...
}
# PUT /pulp/api/v3/repositories/file/file/<uuid>/ labels='[{"foo": "baz"}]'
{
...
"labels": {"foo": "baz"},
...
}
```
----
## Database Design
### Option 1
This design uses a set of labels that are shared across resources.
#### Label
* **pulp_id** - uuid primary key
* **key** - the key of the label
* **value** - the value for a label
#### ResourceLabel (extends GenericRelationModel)
* **resource** - generic foreign key
* **label** - foreign key to `Label` table
Constraints
* `Label` key and value are unique together
* `ResourceLabel` resource and label are unique together
* resource and label key are unique together
One challenge would be dealing with orphaned labels. Either the label would persist after it's been orphaned or we'd have to cleanup labels as they get orphaned. For the latter, we'd need to catch the case that the label doesn't exist when associating it and if so, create it.
### Option 2
This design uses a map object that stores the resource's labels as an hstore. It is a one-to-one relationship.
#### ResourceLabelMap (extends GenericRelationModel)
* **resource** - generic foreign key
* **data** - [hstore field](https://django-hstore.readthedocs.io/en/latest/) that maps keys to values
Constraints
* resource must be unique
One challenge is when to create the `ResourceLabelMap`. Would we create it with the resource? Or when a label is first set? Also, not sure about hstore performance when filtering.
### Option 3
Instead of using generic foreign keys to relate tags or data mappings to remotes, repositories, etc, we could relate the remotes, repos, repoversions, publications, distributions to a common "resource" object that holds the tags. The direct relationships would make queries easier and more performant, and avoid invalid data that is endemic to GFKs.
Additionally the Resource model can be used to replace CreatedResources and potentially resolve https://pulp.plan.io/issues/6496
I haven't really looked into it but maybe it could be used to simplify task resource locking as well, and eliminate one of those models.
The easiest way to implement this is another level of multi-table inheritance. Since all of our models are already "paying the cost" of multi-table inheritance, there isn't much real downside in terms of usability.
#### Questions
* How does migration work when modifying the inheritance hierarchy?
* Where in the inheritance hierarchy does it go?
#### PulpResource
* << tag data, stored either way we decide to store it in options one or two above >>
* pre-computed pulp_href of the resource (for serialization via created_resources and possibly the task resource locking stuff)