rel_path notes

Summary of The Problem

  • Pulp makes assumptions about RPMs-in-repos that are not, alas, true In the Wild
    • a given checksum can only belong to one filename
    • a given NEVRA can only be in one location
    • a given NEVRA will always have the same checksum
    • a repo's filestructure can always be reorganized

Code-only workarounds?

Assume we do not want to require a schema-change (at least not in pulpcore) to handle these

Discovered data use-cases

same hash, different filenames

  • https://packages.grafana.com/oss/rpm
    • grafana-2.6.0-1.x86_64.rpm
    • grafana-2.6.0.x86_64.rpm
  • http://repo.mysql.com/yum/mysql-tools-community/el/7/x86_64/
    • mysql-workbench-community-debuginfo-8.0.18-1.el7.x86_64.rpm
    • mysql-workbench-community-8.0.18-1.el7.x86_64.rpm
  • breaks sync only after 2to3 migration
    • find the "we've already seen this hash" assumption in 2to3 and fix it?
    • fun - Pulp2 does NOT have the 'duplicated' RPM sync'd, grafana repo has 372 rpms, pulp2 syncs 371
    • 2to3 syncs what it has, correctly

same NEVRA, different hashes, different locations

  • http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/
    • fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
      • hash 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e053704b20d8c9
    • logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
      • hash 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687306ef935f49d3
  • https://updates.suse.com/SUSE/Backports/SLE-12-SP2_x86_64/standard/
    • rpm/x86_64_SP0/openjpeg2-devel-2.1.0-2.1.x86_64.rpm
      • hash b0fb8ca3878e9c70fe82b91ccf81c74439689a7afd0764449841ae01192f0c18
    • rpm/x86_64_SP1/openjpeg2-devel-2.1.0-2.1.x86_64.rpm
      • hash 9722732577ab816e60488f3be7ba0930195da91370c156e302042f8108ae7490
  • prevents sync, any combo if immediate/on_demand/mirror
  • pulpcore/app/files.validate_file_paths() - fails on ContentArtifact.rel_path being filename, and we see it twice due to diff hashes
    • can we notice we're mirrored and not-call this maybe?
      • not easily - mirror is a sync-cmd-attribute, by the time we get here we don't know how we were called

Problem remotes

https://pulp.plan.io/issues/8133

Problem

"I can't sync postgres remotes"

Can't reproduce - OP can't either.

Notes from iballou: appears to only happen post-2to3-migration. Still can't reproduce that way, either.

Plan

  • script to build test for the (many) postgres variants into one Pulp3
  • report results back to issue, foreman

Notes

  • grafan fails is:
    • sync-on-demand to Pulp2
    • migrate
    • sync in Pulp3
    • fails because :
      • the repo has one RPM that lives as two names, grafana-2.6.0.x86_64.rpm and grafana-2.6.0-1.x86_64.rpm
      • migrate gets (only) grafana-2.6.0-1
      • sync finds grafana-2.6.0 but can't find a contentartifact to match
        • why? because that sha256 was taken by -1

https://pulp.plan.io/issues/7208

Problem

"Valid" (for sufficiently "are you f'in kidding me?!?" definitions of the word) centos repo not sync'able.

pulp=> select * from core_contentartifact where relative_path = 'rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm';
               pulp_id                |         pulp_created          |       pulp_last_updated       |                   relative_path                   | artifact
_id |              content_id              
--------------------------------------+-------------------------------+-------------------------------+---------------------------------------------------+---------
----+--------------------------------------
 bd1fb28b-e7dd-469a-aa3d-a16da0c42766 | 2021-07-07 16:49:39.388602+00 | 2021-07-07 16:49:39.388607+00 | rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm |         
    | 445ebbae-660a-488a-beb5-56ba7149430e
 c1824574-55c9-41c7-8e3a-bcb1cb481c1c | 2021-07-07 16:49:42.125159+00 | 2021-07-07 16:49:42.125164+00 | rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm |         
    | 95be8fd1-1936-4278-8de2-5c728eed4a11
(2 rows)
pulp=> select url, sha256 from core_remoteartifact where content_artifact_id in ('bd1fb28b-e7dd-469a-aa3d-a16da0c42766', 'c1824574-55c9-41c7-8e3a-bcb1cb481c1c');
                                                      url                                                      |                              sha256                
              
---------------------------------------------------------------------------------------------------------------+----------------------------------------------------
--------------
 http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e
053704b20d8c9
 http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm | 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687
306ef935f49d3
(2 rows)
  • This isn't relative-path - this is "same nevra, diff csum, same repo"
  • no-op pulpcore/app/files.validate_file_paths() and publish will then fail with:
duplicate key value violates unique constraint "core_publishedartifact_publication_id_relative__97f785f4_uniq"
DETAIL:  Key (publication_id, relative_path)=(d544dec0-4de6-47f7-aba1-6c2b62a1557a, Packages/p/python-msgpack-0.4.6-3.el7.x86_64.rpm) already exists.
  • remote wants the file in two different places - Publish wants it in the 'standard' one only
  • mirror (without autopublish) works, if the validate-check is disabled!

Plan

  • re-evaluate the validate_file_path() assumption
  • make sure mirror and autopublish can't both be set true on the same repo
  • update issue with "use mirror once PR XYZ is merged"

https://pulp.plan.io/issues/7507

Problem

Syncing SUSE repos fails in pairs.

First pair:

Second pair:

See also https://pulp.plan.io/issues/6303

Not https://pulp.plan.io/issues/6303

  • that one is specific to modularity, and is fixed

Plan

  • SUSE access
  • figure out exactly what the problem is
    • two rpms, same nevra, different hashes, different locations
    • no idea how this ever works/worked, in pulp3