rel_path notes
Summary of The Problem
- Pulp makes assumptions about RPMs-in-repos that are not, alas, true In the Wild
- a given checksum can only belong to one filename
- a given NEVRA can only be in one location
- a given NEVRA will always have the same checksum
- a repo's filestructure can always be reorganized
Code-only workarounds?
Assume we do not want to require a schema-change (at least not in pulpcore) to handle these
Discovered data use-cases
same hash, different filenames
- https://packages.grafana.com/oss/rpm
- grafana-2.6.0-1.x86_64.rpm
- grafana-2.6.0.x86_64.rpm
- http://repo.mysql.com/yum/mysql-tools-community/el/7/x86_64/
- mysql-workbench-community-debuginfo-8.0.18-1.el7.x86_64.rpm
- mysql-workbench-community-8.0.18-1.el7.x86_64.rpm
- breaks sync only after 2to3 migration
- find the "we've already seen this hash" assumption in 2to3 and fix it?
- fun - Pulp2 does NOT have the 'duplicated' RPM sync'd, grafana repo has 372 rpms, pulp2 syncs 371
- 2to3 syncs what it has, correctly
same NEVRA, different hashes, different locations
- http://ftp.cs.stanford.edu/centos/7/opstools/x86_64/
- fluentd/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
- hash 6a8148297b09c9bb7fa433e1559e20760b21c6d9cf10eb8569e053704b20d8c9
- logging/rubygem-multipart-post-doc-2.0.0-2.el7.noarch.rpm
- hash 48e054113e7bb6b4b52d9c34726f5a58ccdc8045bfe5e15f687306ef935f49d3
- https://updates.suse.com/SUSE/Backports/SLE-12-SP2_x86_64/standard/
- rpm/x86_64_SP0/openjpeg2-devel-2.1.0-2.1.x86_64.rpm
- hash b0fb8ca3878e9c70fe82b91ccf81c74439689a7afd0764449841ae01192f0c18
- rpm/x86_64_SP1/openjpeg2-devel-2.1.0-2.1.x86_64.rpm
- hash 9722732577ab816e60488f3be7ba0930195da91370c156e302042f8108ae7490
- prevents sync, any combo if immediate/on_demand/mirror
- pulpcore/app/files.validate_file_paths() - fails on ContentArtifact.rel_path being filename, and we see it twice due to diff hashes
- can we notice we're mirrored and not-call this maybe?
- not easily - mirror is a sync-cmd-attribute, by the time we get here we don't know how we were called
Problem remotes
Problem
"I can't sync postgres remotes"
Can't reproduce - OP can't either.
Notes from iballou: appears to only happen post-2to3-migration. Still can't reproduce that way, either.
Plan
- script to build test for the (many) postgres variants into one Pulp3
- report results back to issue, foreman
Notes
- grafan fails is:
- sync-on-demand to Pulp2
- migrate
- sync in Pulp3
- fails because :
- the repo has one RPM that lives as two names, grafana-2.6.0.x86_64.rpm and grafana-2.6.0-1.x86_64.rpm
- migrate gets (only) grafana-2.6.0-1
- sync finds grafana-2.6.0 but can't find a contentartifact to match
- why? because that sha256 was taken by -1
Problem
"Valid" (for sufficiently "are you f'in kidding me?!?" definitions of the word) centos repo not sync'able.
- This isn't relative-path - this is "same nevra, diff csum, same repo"
- no-op pulpcore/app/files.validate_file_paths() and publish will then fail with:
- remote wants the file in two different places - Publish wants it in the 'standard' one only
- mirror (without autopublish) works, if the validate-check is disabled!
Plan
- re-evaluate the validate_file_path() assumption
- make sure mirror and autopublish can't both be set true on the same repo
- update issue with "use mirror once PR XYZ is merged"
Problem
Syncing SUSE repos fails in pairs.
First pair:
Second pair:
See also https://pulp.plan.io/issues/6303
Not https://pulp.plan.io/issues/6303
- that one is specific to modularity, and is fixed
Plan
- SUSE access
- figure out exactly what the problem is
- two rpms, same nevra, different hashes, different locations
- no idea how this ever works/worked, in pulp3