## Context - [OpenNeuroDatasets-JSONLD](https://github.com/OpenNeuroDatasets-JSONLD) repo contains forks of datasets that exist in the [OpenNeuroDatasets](https://github.com/OpenNeuroDatasets) repo that we have been able to *semi-manually* create data dictionaries for. - https://github.com/OpenNeuroDatasets-JSONLD/.github contains the code (`code/`) for updating the repos of OpenNeuroDatasets-JSONLD. - ⚠️ Script `update_json` assumes upstream doesn't have any Neurobagel annotations. - [openneuro-annotations](https://github.com/neurobagel/openneuro-annotations) contains JSONs auto-generated (with some minor manual revisions?) based on our manual spreadsheet annotation of the [**OpenNeuro datalad superdataset**](https://datasets.datalad.org/?dir=/openneuro). ⚠️ This datalad superdataset does not entirely overlap with OpenNeuroDatasets repo (e.g., some datasets exist only in the datalad superdataset). ## Current process to update and re-process the Neurobagel OpenNeuro forks and graph 1. For datasets that we've annotated (openneuro-annotations): 1. Clone the original OpenNeuroDatasets repos 2. Pull any changes to the dataset from the original/upstream repo into our forks of the repos (OpenNeuroDatasets-JSONLD) 3. Re-apply our annotations to the `participants.json` found in OpenNeuroDatasets (preserving the original indentation if a `participants.json` already exists!) and push changes to our forks - i.e., right now, if the Neurobagel data dictionaries need to be updated following a data model change, this must be done in `openneuro-annotations` first 2. Run the Neurobagel CLI on our forks (OpenNeuroDatasets-JSONLD), making a named copy of the pheno-bids JSONLD in openneuro-annotations 3. Push any changes to JSONLD files in openneuro-annotations (to keep our public copy of the graph data files up-to-date) 4. Upload the JSONLD data in `openneuro-annotations` to our `open_neuro` graph database, clearing the existing data in the graph ## Problems encountered when updating the forks (OpenNeuroDatasets-JSONLD) - 7/448 datasets from openneuro-annotations could not be found in OpenNeuroDatasets - these datasets exist in https://datasets.datalad.org/openneuro, but not in OpenNeuroDatasets/OpenNeuroDatasets-JSONLD - 441/448 datasets available in OpenNeuroDatasets-JSONLD - 7/448 datasets from openneuro-annotations could not be updated in OpenNeuroDatasets-JSONLD - `fatal: Not possible to fast-forward, aborting.` -> could not be brought up-to-date with their upstreams, and so new `participant.json` files also were not pushed successfully - 2/448 datasets failed due to upstream default branch `main` not being found ### CLI failures - 3/441 datasets failed because both `master` and `main` branches exist, but upstream now uses `main` so this was the fork branch updated, but the 'default' branch that is used by the CLI is still master ## 2024-04-02 JSONLD update - 335/441 datasets passed the CLI - 6 new failures compared to 10/2023 update: - Non-unique participant x sessions in TSVs (these OpenNeuro datasets have a session column in the TSV): - ds001653 - ds002674 - ds003416 - ds003821 - ds003949 - ds004114 ### Internal issues relevant to updating the OpenNeuro node - https://github.com/neurobagel/project/issues/144 - https://github.com/neurobagel/planning/issues/50 - https://github.com/neurobagel/planning/issues/93 - https://github.com/neurobagel/planning/issues/67