# rel_path notes
## Summary of The Problem
* Pulp makes assumptions about RPMs-in-repos that are not, alas, true In the Wild
* a given checksum can only belong to one filename
* a given NEVRA can only be in one location
* a given NEVRA will always have the same checksum
* a repo's filestructure can always be reorganized
## Code-only workarounds?
Assume we do **not** want to require a schema-change (at least not in pulpcore) to handle these
*
## Discovered data use-cases
### same hash, different filenames
* https://packages.grafana.com/oss/rpm
* grafana-2.6.0-1.x86_64.rpm
* grafana-2.6.0.x86_64.rpm
* http://repo.mysql.com/yum/mysql-tools-community/el/7/x86_64/
* mysql-workbench-community-debuginfo-8.0.18-1.el7.x86_64.rpm
* mysql-workbench-community-8.0.18-1.el7.x86_64.rpm
* breaks sync only after 2to3 migration
* find the "we've already seen this hash" assumption in 2to3 and fix it?
* fun - Pulp2 does NOT have the 'duplicated' RPM sync'd, grafana repo has 372 rpms, pulp2 syncs 371
* 2to3 syncs what it has, correctly
### same NEVRA, different hashes, different locations
* http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/
* fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
* hash 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e053704b20d8c9
* logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
* hash 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687306ef935f49d3
* https://updates.suse.com/SUSE/Backports/SLE-12-SP2_x86_64/standard/
* rpm/x86_64_SP0/openjpeg2-devel-2.1.0-2.1.x86_64.rpm
* hash b0fb8ca3878e9c70fe82b91ccf81c74439689a7afd0764449841ae01192f0c18
* rpm/x86_64_SP1/openjpeg2-devel-2.1.0-2.1.x86_64.rpm
* hash 9722732577ab816e60488f3be7ba0930195da91370c156e302042f8108ae7490
* prevents sync, any combo if immediate/on_demand/mirror
* pulpcore/app/files.validate_file_paths() - fails on ContentArtifact.rel_path being filename, and we see it twice due to diff hashes
* can we notice we're mirrored and not-call this maybe?
* not easily - mirror is a sync-cmd-attribute, by the time we get here we don't know how we were called
## Problem remotes
### https://pulp.plan.io/issues/8133
#### Problem
"I can't sync postgres remotes"
Can't reproduce - OP can't either.
Notes from iballou: appears to only happen post-2to3-migration. Still can't reproduce that way, either.
* Foreman discussion : https://community.theforeman.org/t/foreman-2-4-katello-4-unable-to-sync-repos/23646
* https://packages.grafana.com/oss/rpm
* "No declared artifact with relative path "grafana-2.6.0-1.x86_64.rpm" for content "<Package: grafana>"
* this is **not** the rel_path problem
* can't reproduce on code/3.14|rpm/3.13
* https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-8-x86_64/
* https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/
#### Plan
* script to build test for the (many) postgres variants into one Pulp3
* report results back to issue, foreman
#### Notes
* grafan fails is:
* sync-on-demand to Pulp2
* migrate
* sync in Pulp3
* fails because :
* the repo has one RPM that lives as two names, grafana-2.6.0.x86_64.rpm and grafana-2.6.0-1.x86_64.rpm
* migrate gets (only) grafana-2.6.0-1
* sync finds grafana-2.6.0 but can't find a contentartifact to match
* why? because that sha256 was taken by -1
### https://pulp.plan.io/issues/7208
#### Problem
"Valid" (for sufficiently "are you f'in kidding me?!?" definitions of the word) centos repo not sync'able.
* http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/
* c_ca.relative_path = rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
* appears at multiple URLs in remote, same NEVRA, different paths, **DIFFERENT** checksums:
~~~
pulp=> select * from core_contentartifact where relative_path = 'rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm';
pulp_id | pulp_created | pulp_last_updated | relative_path | artifact
_id | content_id
--------------------------------------+-------------------------------+-------------------------------+---------------------------------------------------+---------
----+--------------------------------------
bd1fb28b-e7dd-469a-aa3d-a16da0c42766 | 2021-07-07 16:49:39.388602+00 | 2021-07-07 16:49:39.388607+00 | rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm |
| 445ebbae-660a-488a-beb5-56ba7149430e
c1824574-55c9-41c7-8e3a-bcb1cb481c1c | 2021-07-07 16:49:42.125159+00 | 2021-07-07 16:49:42.125164+00 | rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm |
| 95be8fd1-1936-4278-8de2-5c728eed4a11
(2 rows)
pulp=> select url, sha256 from core_remoteartifact where content_artifact_id in ('bd1fb28b-e7dd-469a-aa3d-a16da0c42766', 'c1824574-55c9-41c7-8e3a-bcb1cb481c1c');
url | sha256
---------------------------------------------------------------------------------------------------------------+----------------------------------------------------
--------------
http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e
053704b20d8c9
http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687
306ef935f49d3
(2 rows)
~~~
* This isn't relative-path - this is "same nevra, diff csum, same repo"
* no-op pulpcore/app/files.validate_file_paths() and publish will then fail with:
~~~
duplicate key value violates unique constraint "core_publishedartifact_publication_id_relative__97f785f4_uniq"
DETAIL: Key (publication_id, relative_path)=(d544dec0-4de6-47f7-aba1-6c2b62a1557a, Packages/p/python-msgpack-0.4.6-3.el7.x86_64.rpm) already exists.
~~~
* remote wants the file in two different places - Publish wants it in the 'standard' one only
* mirror (**without** autopublish) works, if the validate-check is disabled!
#### Plan
* re-evaluate the validate_file_path() assumption
* make sure mirror and autopublish can't both be set true on the same repo
* update issue with "use mirror once PR XYZ is merged"
### https://pulp.plan.io/issues/7507
#### Problem
Syncing SUSE repos fails in pairs.
First pair:
* https://updates.suse.com/SUSE/Backports/SLE-12-SP5_x86_64/standard/
* https://updates.suse.com/SUSE/Backports/SLE-12-SP5_x86_64/product/
Second pair:
* https://updates.suse.com/SUSE/Backports/SLE-12-SP4_x86_64/product/
* https://updates.suse.com/SUSE/Backports/SLE-12-SP4_x86_64/standard/
See also https://pulp.plan.io/issues/6303
* https://dl.fedoraproject.org/pub/fedora/linux/updates/30/Modular/x86_64/
* https://dl.fedoraproject.org/pub/fedora/linux/updates/31/Modular/x86_64/
Not https://pulp.plan.io/issues/6303
* that one is specific to modularity, and is fixed
#### Plan
* SUSE access
* figure out **exactly** what the problem is
* two rpms, same nevra, different hashes, different locations
* no idea how this **ever** works/worked, in pulp3