# Testing with offline data
See [SwarmPAL, Including test data](https://github.com/orgs/Swarm-DISC/projects/1?pane=issue&itemId=46530806).
## Outline
Wishlist:
* Package user
* should not have to download test data even using git clone
* developing a new feature
* Unit tests should run fast
* Initial data download should be automatic and pain free
* Adding test data
* Small number of steps
* Any contributor should be able to add new data sets
| Usecase | static site with rsync/wget | [pooch][POOCH] | [git-lfs][GITLFS] |
| --- | --- | --- | --- |
| Package user | | | looks like `git clone` will get the data |
| Developing a feature | need to run `download_data.sh` | | |
| Adding unit tests | contributors need ssh access to static | add data to a second repo | familiar git workflow, but not sure about integration with tools |
| | | have consistent branch/tag names across two repos | |
| CI | new dependancies: rsync,wget,md5sum | | one user reported difficulty in [CI][GITLFS_CI] |
| | run a script in workflow | | |
[POOCH]: https://www.fatiando.org/pooch/latest/sample-data.html#basic-setup
[GITLFS]: https://git-lfs.com/
[GITLFS_CI]: https://old.reddit.com/r/git/comments/uxaca7/when_to_use_git_lfs_large_files_storage/i9y1309/
## Work in progress
Separate files for
* `config.yaml` for saving the description of a list of datasets and expected results that should be produced in unit tests
* `generate_data.py` downloads data with SwarmPAL and saves the result as NetCDF file.
* `sync_data.sh` uses `rsync` to upload data to GeoSciences servers where they are hosted as a [static web](https://www.geos.ed.ac.uk/~ddekler/swarmpal_data/) site.
* `download_data.sh` uses `wget` to download files from GeoSciences servers