# Ansible Import/Export debugging
## Failure stack-trace
```
pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.app.tasks.importer:INFO: ...Importing resource ContentArtifactResource.
pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.app.tasks.importer:INFO: ...1233 import-errors encountered importing ./tmpj9ddnvn6/repository-rh-certified_1/pulpcore.app.modelresource.ContentArtifactResource.json, retrying
pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.app.tasks.importer:ERROR: FATAL import-failure importing ./tmpj9ddnvn6/repository-rh-certified_1/pulpcore.app.modelresource.ContentArtifactResource.json
pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.tasking.pulpcore_worker:INFO: Task 8671f977-bf46-446a-a9fa-ddb0f8876c05 failed (Content matching query does not exist.)
pulp [dea8745d36bb46619ddfbc0009058797]: pulpcore.tasking.pulpcore_worker:INFO: File "/home/vagrant/devel/pulpcore/pulpcore/tasking/pulpcore_worker.py", line 410, in _perform_task
result = func(*args, **kwargs)
File "/home/vagrant/devel/pulpcore/pulpcore/app/tasks/importer.py", line 199, in import_repository_version
_import_file(ca_path, ContentArtifactResource, retry=True)
File "/home/vagrant/devel/pulpcore/pulpcore/app/tasks/importer.py", line 81, in _import_file
a_result = resource.import_data(data, raise_errors=True)
File "/usr/local/lib/pulp/lib64/python3.8/site-packages/import_export/resources.py", line 777, in import_data
return self.import_data_inner(
File "/usr/local/lib/pulp/lib64/python3.8/site-packages/import_export/resources.py", line 829, in import_data_inner
raise row_result.errors[-1].error
File "/usr/local/lib/pulp/lib64/python3.8/site-packages/import_export/resources.py", line 666, in import_row
self.before_import_row(row, **kwargs)
File "/home/vagrant/devel/pulpcore/pulpcore/app/modelresource.py", line 98, in before_import_row
linked_content = Content.objects.get(upstream_id=row["content"])
File "/usr/local/lib/pulp/lib64/python3.8/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/pulp/lib64/python3.8/site-packages/django/db/models/query.py", line 435, in get
raise self.model.DoesNotExist(
```
## Problem Statement
1233 ContentArtifact entries, cannot find their matching (previously-imported) Content entities. **1233** is important - we shall see it again shortly.
## Investigation notes
* Number of exported entities:
```
(pulp) []$ for f in *.json; do echo $f; cat $f | jq length ; done
pulp_ansible.app.modelresource.CollectionDeprecationResource.json
0
pulp_ansible.app.modelresource.CollectionResource.json
4913
pulp_ansible.app.modelresource.CollectionVersionContentResource.json
5266
pulp_ansible.app.modelresource.RoleContentResource.json
0
pulp_ansible.app.modelresource.TagResource.json
7
pulpcore.app.modelresource.ContentArtifactResource.json
5266
```
ContentArtifact and CollectionVersion match. This is good, CollectionVersion is the only Content subclass involved here, so we expect a CA link for every piece of Content.
* Find all the CollectionVersion content-ids from upstream (saved as 'upstream_id' at export time) (NOTE: for all the gory technical details about what the hell 'upstream-id' is, see [Handling entities without a ‘natural’ key](https://hackmd.io/@ggainey/importexport_lowlevel_naturalkeys))
```
(pulp) []$ cat pulp_ansible.app.modelresource.CollectionVersionContentResource.json | jq '.[] | .upstream_id' | sort > cv_content_ids
(pulp) []$ wc cv_content_ids
5266 5266 205374 cv_content_ids
(pulp) []$
```
* Notice (after some digging) that there are more entries in CollectionVersionContentResource, than there are **unique** upstream-ids in the export. Uh...whut?
* How many duplicate upstream-ids are there in the export?!?
```
(pulp) []$uniq -d cv_content_ids | wc
1233 1233 48087
```
* Oh dear. This is exactly the number of errors we get at import-time.
* Do we have any duplicated content-ids in the ContentArtifact export?
```
(pulp) []$ cat pulpcore.app.modelresource.ContentArtifactResource.json | jq '.[] | .content' | sort > content_ids
(pulp) []$ uniq -d content_ids | wc
0 0 0
```
* Is it one record duplicated 1233 times? 1233 records each duplicated once? or some combination?
* "1233 entities duplicated once each"
```
(pulp) []$ uniq -c cv_content_ids | sort | awk -F " " '{print $1}' | uniq -c
2800 1
1233 2
(pulp)
```
## "Conclusion"
The export-file is in a never-happen state - while there are "enough" Content units to satisfy the ContentArtifact entries, there aren't enough **unique** content units, and therefore we have too little content to import.
Somehow, at export-time, some CollectionVersion entities are being duplicated and used in place of the missing pulp-ids. This feels like, at export-time, when we ask an Ansible repo "give me all the CollectionVersion content in this repo-version", we're getting a list that includes dups and misses the pulp-id of the duplicated entries. I don't see how that's possible, given that the query that's invoked is:
```return CollectionVersion.objects.filter(pk__in=self.repo_version.content)```
which looks perfectly reasonable.