## Context
- [OpenNeuroDatasets-JSONLD](https://github.com/OpenNeuroDatasets-JSONLD) repo contains forks of datasets that exist in the [OpenNeuroDatasets](https://github.com/OpenNeuroDatasets) repo that we have been able to *semi-manually* create data dictionaries for.
- https://github.com/OpenNeuroDatasets-JSONLD/.github contains the code (`code/`) for updating the repos of OpenNeuroDatasets-JSONLD.
- ⚠️ Script `update_json` assumes upstream doesn't have any Neurobagel annotations.
- [openneuro-annotations](https://github.com/neurobagel/openneuro-annotations) contains JSONs auto-generated (with some minor manual revisions?) based on our manual spreadsheet annotation of the [**OpenNeuro datalad superdataset**](https://datasets.datalad.org/?dir=/openneuro). ⚠️ This datalad superdataset does not entirely overlap with OpenNeuroDatasets repo (e.g., some datasets exist only in the datalad superdataset).
## Current process to update and re-process the Neurobagel OpenNeuro forks and graph
1. For datasets that we've annotated (openneuro-annotations):
1. Clone the original OpenNeuroDatasets repos
2. Pull any changes to the dataset from the original/upstream repo into our forks of the repos (OpenNeuroDatasets-JSONLD)
3. Re-apply our annotations to the `participants.json` found in OpenNeuroDatasets (preserving the original indentation if a `participants.json` already exists!) and push changes to our forks
- i.e., right now, if the Neurobagel data dictionaries need to be updated following a data model change, this must be done in `openneuro-annotations` first
2. Run the Neurobagel CLI on our forks (OpenNeuroDatasets-JSONLD), making a named copy of the pheno-bids JSONLD in openneuro-annotations
3. Push any changes to JSONLD files in openneuro-annotations (to keep our public copy of the graph data files up-to-date)
4. Upload the JSONLD data in `openneuro-annotations` to our `open_neuro` graph database, clearing the existing data in the graph
## Problems encountered when updating the forks (OpenNeuroDatasets-JSONLD)
- 7/448 datasets from openneuro-annotations could not be found in OpenNeuroDatasets - these datasets exist in https://datasets.datalad.org/openneuro, but not in OpenNeuroDatasets/OpenNeuroDatasets-JSONLD
- 441/448 datasets available in OpenNeuroDatasets-JSONLD
- 7/448 datasets from openneuro-annotations could not be updated in OpenNeuroDatasets-JSONLD
- `fatal: Not possible to fast-forward, aborting.` -> could not be brought up-to-date with their upstreams, and so new `participant.json` files also were not pushed successfully
- 2/448 datasets failed due to upstream default branch `main` not being found
### CLI failures
- 3/441 datasets failed because both `master` and `main` branches exist, but upstream now uses `main` so this was the fork branch updated, but the 'default' branch that is used by the CLI is still master
## 2024-04-02 JSONLD update
- 335/441 datasets passed the CLI
- 6 new failures compared to 10/2023 update:
- Non-unique participant x sessions in TSVs (these OpenNeuro datasets have a session column in the TSV):
- ds001653
- ds002674
- ds003416
- ds003821
- ds003949
- ds004114
### Internal issues relevant to updating the OpenNeuro node
- https://github.com/neurobagel/project/issues/144
- https://github.com/neurobagel/planning/issues/50
- https://github.com/neurobagel/planning/issues/93
- https://github.com/neurobagel/planning/issues/67