# Notes Jan 19 2022
Show how to generate a graphic representation of db schema
* [ggainey](https://) AI to figure out if we can use the tool on RHEL7
* https://django-extensions.readthedocs.io/en/latest/graph_models.html
* [py-graphiz available on RHEL7](https://access.redhat.com/downloads/content/graphviz-python/2.30.1-22.el7/x86_64/fd431d51/package)
* django-extensions does **NOT** appear to be available as RPMs for RHEL - need to pip-install :(
* https://hackmd.io/@pulp/pulp3_and_graphviz
* Explain relationships for content
* https://docs.pulpproject.org/pulpcore/concepts.html
* Artifact, Content, ContentArtifact, Repository, RepositoryVersion
* Provide list of handy REST API calls
* List of SQL queries that are useful
* Explain why artifacts are named how they are named. Content addresable storage. Pulp's need to support more than filesystem storage.
* Currently not possible to export a repository to a filesystem.
* Pulp2 Way involved a bajillion symlinks, and is the primary reason publish was So Damn Slow
* Provide an explanation for when you need to resync a capsule because the layout of the repository on the main satellite changed.
* RFE for tasks by worker
* Explain that all the sync optimizations are the same for capsules and main sat.
* Explain PULP_CONTENT_PREMIGRATION_BATCH_SIZE=1000 - maybe update docs?
* Recipe for tricking pulp into redownloading an RPM.
# Digging into Pulp 3 internals
To get Pulp 3 configuration run `dynaconf list`.
In Katello environment:
`sudo -u pulp PULP_SETTINGS='/etc/pulp/settings.py' DJANGO_SETTINGS_MODULE='pulpcore.app.settings' dynaconf list`
### How to get RPM repository metadata
* download from the distributed version
* CONTENT_ORIGIN (e.g. `http://pulp.example.com/`) is where all the content, including repositories are served.
* Repositories are served at CONTENT_PATH_PREFIX, which is usually `/pulp/content/`.
* Download the repodata of interest, e.g.
`wget http://pulp.example.com/pulp/content/<your_repo_base_path>/repodata/<metadata_of_your_interest>`
* find on the disk
* get the path using pulpcore-manager
```
>>> from pulp_rpm.app.models.repository import RpmDistribution, RpmPublication
>>> from pulpcore.app.models.repository import RepositoryVersion
# [best option] if the repo base path is known (whatever goes after /pulp/content/ to the root of a repository)
# here the base path is 'centos8ha', and the metadata of interest is 'updateinfo'.
>>> RpmDistribution.objects.get(base_path='centos8ha').publication.published_metadata.get(relative_path__contains='updateinfo').contentartifact_set.first().artifact.file.path
'/var/lib/pulp/artifact/8b/7fdc80cb0303305933f5d7d57520200f90f787ea3650f2c7e62511a4b16358'
# if a publication uuid is known
>>> RpmPublication.objects.get(pulp_id='a7996954-79fa-4334-a8a5-9fba0c2560cc').published_metadata.get(relative_path__contains='updateinfo').contentartifact_set.first().artifact.file.path
'/var/lib/pulp/artifact/8b/7fdc80cb0303305933f5d7d57520200f90f787ea3650f2c7e62511a4b16358'
# if a repository uuid and repository version number are known
>>> RepositoryVersion.objects.get(repository='88835194-54a7-4af4-a062-6f572edeec0b', number=1, complete=True).publication_set.first().published_metadata.get(relative_path__contains='updateinfo').contentartifact_set.first().artifact.file.path
'/var/lib/pulp/artifact/8b/7fdc80cb0303305933f5d7d57520200f90f787ea3650f2c7e62511a4b16358'
```
* get the path directly from the database
```
# The case when a base path is known.
# I'll show in multiple steps for better readability. Feel free to combine into N joins or nested queries.
pulp=> SELECT rpm_rpmdistribution.publication_id FROM rpm_rpmdistribution INNER JOIN core_basedistribution ON (rpm_rpmdistribution.basedistribution_ptr_id = core_basedistribution.pulp_id) WHERE core_basedistribution.base_path = 'centos8ha';
publication_id
--------------------------------------
a7996954-79fa-4334-a8a5-9fba0c2560cc
(1 row)
# published metadata is also considered content at the db level
pulp=> SELECT core_content.pulp_id as published_metadata_content_id FROM core_publishedmetadata INNER JOIN core_content ON (core_publishedmetadata.content_ptr_id = core_content.pulp_id) WHERE core_publishedmetadata.publication_id = 'a7996954-79fa-4334-a8a5-9fba0c2560cc' AND core_publishedmetadata.relative_path LIKE '%updateinfo%';
published_metadata_content_id
--------------------------------------
792b19da-1952-408f-904e-69715e48b533
(1 row)
pulp=> SELECT core_artifact.file FROM core_contentartifact INNER JOIN core_artifact ON (core_contentartifact.artifact_id = core_artifact.pulp_id) WHERE core_contentartifact.content_id = '792b19da-1952-408f-904e-69715e48b533';
file
----------------------------------------------------------------------------
artifact/8b/7fdc80cb0303305933f5d7d57520200f90f787ea3650f2c7e62511a4b16358
(1 row)
```
The path is relative to the MEDIA_ROOT, usually, '/var/lib/pulp' or /var/lib/pulp/media.
Keep in mind, that storage is not alwayd local, it can be S3 or Azure storage or something else. But for Katello installations, it should be only local, as of now.
### How to find/check by filename or NEVRA if an RPM is on the disk
* by filename
* get the path using pulpcore-manager
```
>>> from pulpcore.app.models.content import ContentArtifact
>>> ContentArtifact.objects.get(relative_path__contains='lion-0.4-1.noarch.rpm').artifact.file.path
'/var/lib/pulp/artifact/72/dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023'
```
* get the path directly from the database relying on the expected filename
```
# If artifact_id is not present, it means that the package is not expected to be on the disk and likely is the on_demand content unit
pulp=> select artifact_id from core_contentartifact where relative_path like '%lion-0.4-1.noarch.rpm';
artifact_id
--------------------------------------
0bc23639-b202-49a1-a1fe-5d4d9529c119
(1 row)
# If the artifact id is present, then it is possible to check where
pulp=> select file from core_artifact where pulp_id='0bc23639-b202-49a1-a1fe-5d4d9529c119';
file
----------------------------------------------------------------------------
artifact/72/dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023
(1 row)
```
* by NEVRA or any other RPM metadata, preferably by the checksum
If RPMs are downloaded you can always get their sha256 checksum and form a path.
```
sha256: 72dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023
path: artifact/72/dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023
```
* using pulpcore-manager
```
# you can ask for any checksum type available to you, the best option is sha256
>>> from pulp_rpm.app.models.package import Package
>>> Package.objects.get(name='lion', epoch='0', version='0.4', release='1', arch='noarch', checksum_type='sha256').pkgId
'72dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023'
# if you have other checksum type than sha256, search the artifact by that checksum
>>> from pulpcore.app.models.content import Artifact
>>> Artifact.objects.get(sha1='<your sha1 checksum>').file.path
'/var/lib/pulp/artifact/72/dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023'
```
* directly in the database
```
# you can ask for any checksum type available to you, the best option is sha256
pulp=> select "pkgId" from rpm_package where name='lion' and epoch='0' and version='0.4' and release='1' and arch='noarch' and checksum_type='sha256';
pkgId
------------------------------------------------------------------
72dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023
(1 row)
# if you have other checksum type than sha256, search the artifact by that checksum
pulp=> select file from core_artifact where sha1='<your sha1 checksum>';
file
----------------------------------------------------------------------------
artifact/72/dd5d133a14e7ca371839f069714a2a44c90c2001d605321fd9e6e7b99ca023
(1 row)
```