# PulpImport/Export Memory Investigation
## Problem statement
* Large datasets are serialized in memory before written to disk.
* `json.dump` cannot be tricked to loop over itertators (in any sane way)
* django_import/export `export()` returns a list in memory instead of a lazy iterator.
* **This is a real problem "in the wild"** with actual Red Hat data and users - we need a short-term solution, ASAP
## Proposed solutions
### Export in chunks
1. Write `"["` to a tempfile
2. Call `export(...).json.encode(...)` on chunks of our queryset (which will be a string with json-list starting/ending with `"[]"`)
3. remove first and last character from the exported string
4. write the string to the tempfile
5. if chunks are left write `","`
6. repeat from 2.
7. Write `"]"` to the tempfile
8. add the tempfile to the tar
#### Advantages:
* contained in one place (`pulpcore.app.importexport._write_export()`)
* we own "all" the code involved
* doesn't try to "take advantage of" current implementation details of lower level libs
* never more than one resource-worth of tempfile at a time on disk
* never more than one "chunk" in memory at a time ("how much" depends on what batch-size we use to work w/ the queryset as we export)
* this change doesn't change the export-file-format - so it doesn't require 'import' to know that we've done anything
* We continue to call the libraries as intended
#### Disadvantages
* post_export() called per-batch, instead of once-per-entire-export-set
* this may not be a problem
* if it is - we already have it in low-memory situations
* we have tests!
* this is UGLY - much commentary needed in the code to explain why we're doing this
* We do string operations on the output
### Create our own export_to_file in QueryModelResource
* This is not necessary for the fix, but a possible way to refactor the code towards the ultimate goal below.
* Would use the above approach, but encapsulate inside of `pulpcore.plugin.importexport.QueryModelResource`, which all PIE model-resources subclass from.
* Sets us up for making it possible to do stream-to-file in the future, by changing this one method (instead of inline somewhere else)
* `export_to_file` would take a `query_set`, a `file_stream`, the `format` and maybe a `batch_size` parameter.
## Ultimate Goal
* show django-import-export what we have to do to work through this problem
* work w/ them to add new d-i-e API to make it possible
* accept new release of d-i-e, and remove all this ugly code all at once
###### tags: `import/export`