# Ansible Import/Export debugging ## Failure stack-trace ``` pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.app.tasks.importer:INFO: ...Importing resource ContentArtifactResource. pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.app.tasks.importer:INFO: ...1233 import-errors encountered importing ./tmpj9ddnvn6/repository-rh-certified_1/pulpcore.app.modelresource.ContentArtifactResource.json, retrying pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.app.tasks.importer:ERROR: FATAL import-failure importing ./tmpj9ddnvn6/repository-rh-certified_1/pulpcore.app.modelresource.ContentArtifactResource.json pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.tasking.pulpcore_worker:INFO: Task 8671f977-bf46-446a-a9fa-ddb0f8876c05 failed (Content matching query does not exist.) pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.tasking.pulpcore_worker:INFO: File "/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py", line 410, in _perform_task result = func(*args, **kwargs) File "/home/vagrant/devel/pulpcore/pulpcore/app/tasks/importer.py", line 199, in import_repository_version _import_file(ca_path, ContentArtifactResource, retry=True) File "/home/vagrant/devel/pulpcore/pulpcore/app/tasks/importer.py", line 81, in _import_file a_result = resource.import_data(data, raise_errors=True) File "/usr/local/lib/pulp/lib64/python3.8/site-packages/import_export/resources.py", line 777, in import_data return self.import_data_inner( File "/usr/local/lib/pulp/lib64/python3.8/site-packages/import_export/resources.py", line 829, in import_data_inner raise row_result.errors[-1].error File "/usr/local/lib/pulp/lib64/python3.8/site-packages/import_export/resources.py", line 666, in import_row self.before_import_row(row, **kwargs) File "/home/vagrant/devel/pulpcore/pulpcore/app/modelresource.py", line 98, in before_import_row linked_content = Content.objects.get(upstream_id=row["content"]) File "/usr/local/lib/pulp/lib64/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/pulp/lib64/python3.8/site-packages/django/db/models/query.py", line 435, in get raise self.model.DoesNotExist( ``` ## Problem Statement 1233 ContentArtifact entries, cannot find their matching (previously-imported) Content entities. **1233** is important - we shall see it again shortly. ## Investigation notes * Number of exported entities: ``` (pulp) []$ for f in *.json; do echo $f; cat $f | jq length ; done pulp_ansible.app.modelresource.CollectionDeprecationResource.json 0 pulp_ansible.app.modelresource.CollectionResource.json 4913 pulp_ansible.app.modelresource.CollectionVersionContentResource.json 5266 pulp_ansible.app.modelresource.RoleContentResource.json 0 pulp_ansible.app.modelresource.TagResource.json 7 pulpcore.app.modelresource.ContentArtifactResource.json 5266 ``` ContentArtifact and CollectionVersion match. This is good, CollectionVersion is the only Content subclass involved here, so we expect a CA link for every piece of Content. * Find all the CollectionVersion content-ids from upstream (saved as 'upstream_id' at export time) (NOTE: for all the gory technical details about what the hell 'upstream-id' is, see [Handling entities without a ‘natural’ key](https://hackmd.io/@ggainey/importexport_lowlevel_naturalkeys)) ``` (pulp) []$ cat pulp_ansible.app.modelresource.CollectionVersionContentResource.json | jq '.[] | .upstream_id' | sort > cv_content_ids (pulp) []$ wc cv_content_ids 5266 5266 205374 cv_content_ids (pulp) []$ ``` * Notice (after some digging) that there are more entries in CollectionVersionContentResource, than there are **unique** upstream-ids in the export. Uh...whut? * How many duplicate upstream-ids are there in the export?!? ``` (pulp) []$uniq -d cv_content_ids | wc 1233 1233 48087 ``` * Oh dear. This is exactly the number of errors we get at import-time. * Do we have any duplicated content-ids in the ContentArtifact export? ``` (pulp) []$ cat pulpcore.app.modelresource.ContentArtifactResource.json | jq '.[] | .content' | sort > content_ids (pulp) []$ uniq -d content_ids | wc 0 0 0 ``` * Is it one record duplicated 1233 times? 1233 records each duplicated once? or some combination? * "1233 entities duplicated once each" ``` (pulp) []$ uniq -c cv_content_ids | sort | awk -F " " '{print $1}' | uniq -c 2800 1 1233 2 (pulp) ``` ## "Conclusion" The export-file is in a never-happen state - while there are "enough" Content units to satisfy the ContentArtifact entries, there aren't enough **unique** content units, and therefore we have too little content to import. Somehow, at export-time, some CollectionVersion entities are being duplicated and used in place of the missing pulp-ids. This feels like, at export-time, when we ask an Ansible repo "give me all the CollectionVersion content in this repo-version", we're getting a list that includes dups and misses the pulp-id of the duplicated entries. I don't see how that's possible, given that the query that's invoked is: ```return CollectionVersion.objects.filter(pk__in=self.repo_version.content)``` which looks perfectly reasonable.