# Pulp3 RPM Advanced-Copy depsolving issue(s)
## Related issues
* ~~[1965942](https://bugzilla.redhat.com/show_bug.cgi?id=1965942) - warnings RE missing dependencies~~
* ~~[9293](https://pulp.plan.io/issues/9293) (backport, CLOSED-CURRENT)~~
* [2003764](https://bugzilla.redhat.com/show_bug.cgi?id=2003764) - depsolving once-per-rpm is nonperformant
* [9387](https://pulp.plan.io/issues/9387), PR [2123](https://github.com/pulp/pulp_rpm/pull/2123)
* [9388](https://pulp.plan.io/issues/9388) Backport request
* **symptoms** of this problem - closed-DUP of '3764/9387/9388
* ~~[1934545](https://bugzilla.redhat.com/show_bug.cgi?id=1934545) - assert during depsolving/logging~~
* ~~[9336](https://pulp.plan.io/issues/9336), [9381](https://pulp.plan.io/issues/9381) Backport request~~
* ~~[1965936](https://bugzilla.redhat.com/show_bug.cgi?id=1965936) - memory use bz~~
* ~~[9335](https://pulp.plan.io/issues/9335)~~
* [1995232](https://bugzilla.redhat.com/show_bug.cgi?id=1995232) - postgres-cpu/explain bz
* [9331](https://pulp.plan.io/issues/9331), [9378](https://pulp.plan.io/issues/9378) Backport request
## Executive Summary
There are two underlying problems. One is the query-performance raised in [1995232](https://bugzilla.redhat.com/show_bug.cgi?id=1995232). The other is that Pulp3 is computing repoclosure on every single rpm added to a destinatoin during a copy, [2003764](https://bugzilla.redhat.com/show_bug.cgi?id=2003764)
Fixing '3764 addresses the majority of the depsolving-problem.
## Issue/BZ shenanigans
### Final state desired
* 6.10 GA
* Fix [1965942](https://bugzilla.redhat.com/show_bug.cgi?id=1965942) - warnings RE missing dependencies **[DONE]**
* Fix [2003764](https://bugzilla.redhat.com/show_bug.cgi?id=2003764) - depsolving once-per-rpm is nonperformant
* Backport '3764 to pulp-rpm 3.14
* Close various BZs/issues/backport-requests around the symptoms as DUPS
* Post-6.10
* Address [1995232](https://bugzilla.redhat.com/show_bug.cgi?id=1995232) - postgres-cpu/explain bz
## Observations
### Pulp issues
* Logging: "can't install" warnings aren't relevant to the repo-depsolve-closure case. PR submitted to remediate.
* assert/OOM signal handing can't happen, the code that needs to set a singal-handler isn't running in python main-thread
* postgres query performance is...terrible.
* assert-cause still unclear
### Katello issue
* Filter-API doesn't do what the user expects
* errata-exclude-before - pulls in ALL 32K RPMS
* results in 4 copy-tasks
* each task takes progressively longer, as dest fills
* Doing advanced-copy 'well' (or even 'acceptably') probably wants a new feature for next-year-katello, "advanced-copy". that sends what the user is actually asking for ("repo as of date-X plus security fixes not including any of the following RPMs")
## Investigation
* Satellite-CV-creation actually fails with a cancelled task, caused by this error message:
~~~
pulpcore-worker-4[1160]: python3: ../src/rules.c:261: solver_addrule: Assertion `!p2 && d > 0' failed.
pulpcore-worker-4[1160]: pulp [None]: pulpcore.tasking.pulpcore_worker:INFO: Cleaning up and canceling Task b4d6e7ec-bcff-4663-9ef5-599bd3bce24b
~~~
* This error is **FATAL** - but rather than failing the task with a traceback, it **CANCELS** the task with no explanation
* need to open an issue for this, it's...Rude
* Have not recreated the fatal error above yet
* see [1934545](https://bugzilla.redhat.com/show_bug.cgi?id=1934545)
* On standalone Pulp, can recreate the warnings by doing an advanced-copy, of RHEL7, of "all RPMs that are not associated with Advisories" (see recipes, below)
* biggest thing Not Found: libc.so.6 (!!), 705 (!!!) times
Investigating on Pulp3-upstream shows that, when adding base-RPMs to advanced-copy, RHEL7-x86_64 and RHEL8-x86_64 tested, **all** of the RPMs that couldn't find their files, were i686 (ie 32-bit) RPMs. Multi-arch strikes again?
RE assert and logging - we're going to need a SIGABRT handler instantiated prior to solver.solve(). The handler can log the problem more thoroughly; not sure how much access we have "inside" such a thing? Investigation needed.
* SQL performance during advanced copy is terrible. Here is an example of a single running postgres task - note the current elapsed time:
~~~
5032 | 01:28:04.176268 | pulp | SELECT "core_content"."pulp_id", "core_content"."pulp_created", "core_content"."pulp_last_updated", "core_content
"."pulp_type", "core_content"."upstream_id", "core_content"."timestamp_of_interest" FROM "core_content" LEFT OUTER JOIN "core_repositorycontent" ON ("c
ore_content"."pulp_id" = "core_repositorycontent"."content_id") WHERE ("core_repositorycontent"."pulp_id" IN (SELECT U0."pulp_id" FROM "core_repository
content" U0 INNER JOIN "core_repositoryversion" U2 ON (U0."version_added_id" = U2."pulp_id") LEFT OUTER JOIN "core_repositoryversion" U3 ON (U0."versio
n_removed_id" = U3."pulp_id") WHERE (U0."repository_id" = '17e556a3-9e77-4ada-9b61-f5fa0f31b112'::uuid AND U2."number" <= 1 AND NOT (U3."number" <= 1 A
ND U3."number" IS NOT NULL))) AND "core_content"."pulp_id" IN ('577501a8-5c99-4980-9295-eb8ce1d8bbec'::uuid, '4a9a4042-484d-4f8b-a19e-612c848dab99'::uu
id, 'cfbefb76-5587-4404-99ba-2024e96f67b3'::uuid, '13a83bb6-8c05-4624-95ed-7b8f79cd8ee8'::uuid, 'ea76d66d-948a-4080-94e7-5b4126561bda'::uuid, 'be0f23dc
-872
~~~
## Useful Sat6 recipes
### basic functions
* Canceling a task:
~~~
curl -X PATCH -d state=canceled \
--cert /etc/pki/katello/certs/pulp-client.crt \
--key /etc/pki/katello/private/pulp-client.key \
https://$(hostname -f)/pulp/api/v3/tasks/UUID/
~~~
* Accessing postgres:
~~~
[root@sat-r220-09 ~]# sudo su - postgres
-bash-4.2$ psql pulpcore
~~~
### Preparing an advanced-copy call "by hand"
* Find all advisories with an updated-date prior to a specified date
~~~
http :/pulp/api/v3/content/rpm/advisories/?repository_version=/pulp/api/v3/repositories/rpm/rpm/a534c985-4add-46f3-8ef1-49de3bbcff8b/versions/1/\&fields='pulp_href,updated_date'\&limit=5000 \
| jq '.results[] \
| select(.updated_date < "2016-03-17")' \
| jq .pulp_href > errata_config
~~~
* Find packages that aren't part of an UpdateRecord (**note: works when there's only one repo sync'd.** Needs work to find just "...for a specific repository")
~~~
-bash-4.2$ psql pulpcore
pulpcore=# select count(p.name) from rpm_package p where not exists (select 1 from rpm_updatecollectionpackage ucp where ucp.name = p.name and ucp.epoch = p.epoch and ucp.version = p.version and ucp.release = p.release and ucp.arch = p.arch);
~~~
* Find content-ptr-id for all packages not in UpdateRecords:
~~~
psql -U pulp -d pulp --host 127.0.0.1 \
-c "select content_ptr_id from rpm_package p where not exists (select 1 from rpm_updatecollectionpackage ucp where ucp.name = p.name and ucp.epoch = p.epoch and ucp.version = p.version and ucp.release = p.release and ucp.arch = p.arch)" \
> base_packages
~~~
* Prepare base_packages to hand off to /copy/:
* Delete header/footer
* Replace header with:
~~~
[
{
"source_repo_version": "/pulp/api/v3/repositories/rpm/rpm/e9e67b6c-5a50-4a48-907b-5d527169c633/versions/1/",
"dest_repo": "/pulp/api/v3/repositories/rpm/rpm/d0422d2b-e232-4f5c-9ec2-fdf2d737c5bf/",
"content": [
~~~
* Replace footer with:
~~~
]
}
]
~~~
* Replace UUIDs with `` "/pulp/api/v3/content/rpm/packages/UUID/",``
* **EXAMPLE**:
~~~
[
{
"source_repo_version": "/pulp/api/v3/repositories/rpm/rpm/e9e67b6c-5a50-4a48-907b-5d527169c633/versions/1/",
"dest_repo": "/pulp/api/v3/repositories/rpm/rpm/d0422d2b-e232-4f5c-9ec2-fdf2d737c5bf/",
"content": [
"/pulp/api/v3/content/rpm/packages/ceb32b0a-11cf-41fc-ac06-96ed4b8a9cec/",
"/pulp/api/v3/content/rpm/packages/6b78b4fd-5cb3-48cf-a3d1-a448a529fed4/",
....
"/pulp/api/v3/content/rpm/packages/e18707da-dae6-4309-b523-be0393f72a65/"
]
}
]
~~~
* Issue the /copy/ command:
~~~
http POST :/pulp/api/v3/rpm/copy/ \
dependency_solving=True \
config:=@./base_packages
~~~
### Pulling useful info from journalctl
* Find the things missing-from depsolve warnings:
~~~
journalctl | \
grep "WARNING: Encountered problems solving dependencies, copy may be incomplete: package" | \
awk -F"copy may be incomplete:" '{print $2}' | \
awk -F" " '{print $4}' |
sort | uniq -c | sort -n
~~~
* Find the packages-looking-for-missing:
~~~
journalctl | \
grep "WARNING: Encountered problems solving dependencies, copy may be incomplete: package" | \
awk -F"copy may be incomplete:" '{print $2}' | \
awk -F" " '{print $2}' | \
sort | uniq -c | sort -n
~~~