# rel_path notes ## Summary of The Problem * Pulp makes assumptions about RPMs-in-repos that are not, alas, true In the Wild * a given checksum can only belong to one filename * a given NEVRA can only be in one location * a given NEVRA will always have the same checksum * a repo's filestructure can always be reorganized ## Code-only workarounds? Assume we do **not** want to require a schema-change (at least not in pulpcore) to handle these * ## Discovered data use-cases ### same hash, different filenames * https://packages.grafana.com/oss/rpm * grafana-2.6.0-1.x86_64.rpm * grafana-2.6.0.x86_64.rpm * http://repo.mysql.com/yum/mysql-tools-community/el/7/x86_64/ * mysql-workbench-community-debuginfo-8.0.18-1.el7.x86_64.rpm * mysql-workbench-community-8.0.18-1.el7.x86_64.rpm * breaks sync only after 2to3 migration * find the "we've already seen this hash" assumption in 2to3 and fix it? * fun - Pulp2 does NOT have the 'duplicated' RPM sync'd, grafana repo has 372 rpms, pulp2 syncs 371 * 2to3 syncs what it has, correctly ### same NEVRA, different hashes, different locations * http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/ * fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm * hash 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e053704b20d8c9 * logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm * hash 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687306ef935f49d3 * https://updates.suse.com/SUSE/Backports/SLE-12-SP2_x86_64/standard/ * rpm/x86_64_SP0/openjpeg2-devel-2.1.0-2.1.x86_64.rpm * hash b0fb8ca3878e9c70fe82b91ccf81c74439689a7afd0764449841ae01192f0c18 * rpm/x86_64_SP1/openjpeg2-devel-2.1.0-2.1.x86_64.rpm * hash 9722732577ab816e60488f3be7ba0930195da91370c156e302042f8108ae7490 * prevents sync, any combo if immediate/on_demand/mirror * pulpcore/app/files.validate_file_paths() - fails on ContentArtifact.rel_path being filename, and we see it twice due to diff hashes * can we notice we're mirrored and not-call this maybe? * not easily - mirror is a sync-cmd-attribute, by the time we get here we don't know how we were called ## Problem remotes ### https://pulp.plan.io/issues/8133 #### Problem "I can't sync postgres remotes" Can't reproduce - OP can't either. Notes from iballou: appears to only happen post-2to3-migration. Still can't reproduce that way, either. * Foreman discussion : https://community.theforeman.org/t/foreman-2-4-katello-4-unable-to-sync-repos/23646 * https://packages.grafana.com/oss/rpm * "No declared artifact with relative path "grafana-2.6.0-1.x86_64.rpm" for content "<Package: grafana>" * this is **not** the rel_path problem * can't reproduce on code/3.14|rpm/3.13 * https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-8-x86_64/ * https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/ #### Plan * script to build test for the (many) postgres variants into one Pulp3 * report results back to issue, foreman #### Notes * grafan fails is: * sync-on-demand to Pulp2 * migrate * sync in Pulp3 * fails because : * the repo has one RPM that lives as two names, grafana-2.6.0.x86_64.rpm and grafana-2.6.0-1.x86_64.rpm * migrate gets (only) grafana-2.6.0-1 * sync finds grafana-2.6.0 but can't find a contentartifact to match * why? because that sha256 was taken by -1 ### https://pulp.plan.io/issues/7208 #### Problem "Valid" (for sufficiently "are you f'in kidding me?!?" definitions of the word) centos repo not sync'able. * http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/ * c_ca.relative_path = rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm * appears at multiple URLs in remote, same NEVRA, different paths, **DIFFERENT** checksums: ~~~ pulp=> select * from core_contentartifact where relative_path = 'rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm'; pulp_id | pulp_created | pulp_last_updated | relative_path | artifact _id | content_id --------------------------------------+-------------------------------+-------------------------------+---------------------------------------------------+--------- ----+-------------------------------------- bd1fb28b-e7dd-469a-aa3d-a16da0c42766 | 2021-07-07 16:49:39.388602+00 | 2021-07-07 16:49:39.388607+00 | rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | | 445ebbae-660a-488a-beb5-56ba7149430e c1824574-55c9-41c7-8e3a-bcb1cb481c1c | 2021-07-07 16:49:42.125159+00 | 2021-07-07 16:49:42.125164+00 | rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | | 95be8fd1-1936-4278-8de2-5c728eed4a11 (2 rows) pulp=> select url, sha256 from core_remoteartifact where content_artifact_id in ('bd1fb28b-e7dd-469a-aa3d-a16da0c42766', 'c1824574-55c9-41c7-8e3a-bcb1cb481c1c'); url | sha256 ---------------------------------------------------------------------------------------------------------------+---------------------------------------------------- -------------- http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e 053704b20d8c9 http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687 306ef935f49d3 (2 rows) ~~~ * This isn't relative-path - this is "same nevra, diff csum, same repo" * no-op pulpcore/app/files.validate_file_paths() and publish will then fail with: ~~~ duplicate key value violates unique constraint "core_publishedartifact_publication_id_relative__97f785f4_uniq" DETAIL: Key (publication_id, relative_path)=(d544dec0-4de6-47f7-aba1-6c2b62a1557a, Packages/p/python-msgpack-0.4.6-3.el7.x86_64.rpm) already exists. ~~~ * remote wants the file in two different places - Publish wants it in the 'standard' one only * mirror (**without** autopublish) works, if the validate-check is disabled! #### Plan * re-evaluate the validate_file_path() assumption * make sure mirror and autopublish can't both be set true on the same repo * update issue with "use mirror once PR XYZ is merged" ### https://pulp.plan.io/issues/7507 #### Problem Syncing SUSE repos fails in pairs. First pair: * https://updates.suse.com/SUSE/Backports/SLE-12-SP5_x86_64/standard/ * https://updates.suse.com/SUSE/Backports/SLE-12-SP5_x86_64/product/ Second pair: * https://updates.suse.com/SUSE/Backports/SLE-12-SP4_x86_64/product/ * https://updates.suse.com/SUSE/Backports/SLE-12-SP4_x86_64/standard/ See also https://pulp.plan.io/issues/6303 * https://dl.fedoraproject.org/pub/fedora/linux/updates/30/Modular/x86_64/ * https://dl.fedoraproject.org/pub/fedora/linux/updates/31/Modular/x86_64/ Not https://pulp.plan.io/issues/6303 * that one is specific to modularity, and is fixed #### Plan * SUSE access * figure out **exactly** what the problem is * two rpms, same nevra, different hashes, different locations * no idea how this **ever** works/worked, in pulp3