BEP 038 Review

My proposal is issued to address the following three central issues and other relevant problems: ## Issue 1: Opening `DatasetType` to values other than `raw` and `derivatives` should have its own BEP. Adding values to `DatasetType` is a major change to the specification that should be broadly discussed by the community, with a preliminary analysis of potential side-effects by the SG and/or Maintainers. It took a long while before `derivatives` became a relevant part of BIDS and many years of discussion about them. I contend that `DatasetType` should keep its special status and be discussed separately. After `DatasetType` is agreed as the appropriate mechanism, the BEP leads intending to add values should state it when presenting the BEP draft to the SG before it becomes listed as an active BEP. For instance, it would not be crazy to contemplate the possibility of having a `DatasetType` such as `freesurfer`, which has a very stable and standardized data structure, to allow it as a standalone dataset. Opening `DatasetType` means opening BIDS to the creation of standards within the standard. Where to draw the line between `raw` and `derivative` has traditionally be a contention point, so enabling more options should be considered very carefully, and provided with prescriptions of how to do it and how to decide beforehand. Otherwise, BEPs proposing new dataset types will creep up as we all tend to think that our area of specialization is special. Please note that this issue does not enter into the actual value of `atlas` proposed by the BEP. That is reviewed next. **Proposed solution**: (1) drop this part of the proposal; (2) discuss the issue as BIDS prescribes; (3) establish whether the intent of `DatasetType` may be open to other dataset types. ## Issue 2: The new value `atlas` for `DatasetType` evades the actual problem. **Evading \*the\* problem that exists**. By creating the new `DatasetType` metadata, the overarching problem is escaped: the fact that **BIDS-Derivatives has not been developed far enough to represent "second-level" analyses**, as in, analyses where data from several subjects, or sessions, or runs, are pooled together. Instead, the current BEP proposal cordons off the problem by creating its own little island. **Solving a problem that does not exist**. The use of the new `DatasetType` is justified to *enable* the sharing of "atlas", as stated in the initial paragraph, and later: > *This will allow sharing existing atlases as stand-alone datasets, validating them via the [BIDS validator](https://github.com/bids-standard/bids-validator) and enabling their integration as sub-datasets of other BIDS datasets.* which suggests that, if a dataset is of `derivative` type then the following is not supported: - The sharing of the dataset stand-alone (which is factually false, derivative datasets are already standalone) - The validation of a derivative dataset (which is circumstancial because the vision is that derivatives are validated as raw one day) - The `derivative` dataset cannot be integrated as a sub-dataset of another BIDS dataset (which is factually false). Therefore, this approach seems to indicate that atlases are somewhere in between "raw" and "derivative" and hence they require their own `DatasetType`. **Proposed solution**: My proposal encodes atlas-derived results and atlas-generating pipelines results within current BIDS-derivatives specifications. If I'm reviewing a paper corresponding to a new template and/or atlas, I would feel better equipped to understand the pipeline and the results if delivered as BIDS-Derivatives, with the most salient intermediate steps there (or transformations so that I can replicate them) instead of a final structure that looks like templateflow's resources putting `atlas-` first. The first reports the atlas creation process, while the second is a fast-track mechanism to emancipate the blobs a researcher wants be reused from the outputs and reporting of the generating pipeline. My understanding of BIDS is that it wants to achieve the first. The act of sharing data and ensuring FAIRness in the delivery of the service is more of a responsibility of other players such as OpenNeuro or TemplateFlow. ## Issue 3: the folder structure is inconsistent with current BIDS raw and derivatives This PR proposes an alternative that is consistent with current BIDS. While for raw and first-level analyses derivatives the spatial reference is established by that of individual subjects, for higher-than-first-level analyses this PR proposes the concept of template, which is the aggregation of feature maps that serve for reference at the individual level (e.g., aggregation of runs, sessions or sets of subjects). That allows for a more consistent organization, which has been already tested in the wild with TemplateFlow. In addition, there are several aspects of atlases (and templates) that this BEP did not cover: ### Problem 1: longitudinal templates (and atlases) The cohort entity of templateflow could resolve this. I can update my PR if it is accepted to contemplate this. ### Problem 2: multi-scale atlases My proposal includes a new scale- entity. ### Problem 3: probabilistic surface parcellations. This would require finding a GIFTI encoding of FreeSurfer's GCS format. This is not really a problem of atlas, but BIDS-Derivatives in general. **Proposed solution**: Implemented by this PR against BEP038. ## Other issues **Downstream problems of the proposed `DatasetType`**. It seems the intent is to have these datasets uploaded to BIDS-compatible platforms such as OpenNeuro as a new means of disseminating and distributing atlases. OpenNeuro does implement FAIR pretty comprehensively, which is fundamental for this intent not to become extremely dangerous, but at the outskirt, the BIDS specifications should refrain from suggesting OpenNeuro should be used for sharing. These atlases will likely be shared through other venues where data versioning, accessibility, etc. are not as transparent or available and that will have the opposite effect that is intended in this BEP (undermined reproducibility and limited reusability of the atlas). But even assuming OpenNeuro as the mechanism for redistribution, there are other issues that are covered in our TemplateFlow paper, which will be problematic if not exacerbated: - Lack of a controled vocabulary for templates' and atlases' names: no one can avoid that two templates are given the same label to the `atlas` entity, and I don't think it would be good for BIDS to attempt to control that. The experience would revive the issues hit with template specifications (https://bids-specification.readthedocs.io/en/stable/appendices/coordinate-systems.html). I also provided an example of this problem within #1281. - Existing templates and atlases will not adopt this. The main way of disseminating templates and atlases remains software packages. It is highly unlikely that software packages will adopt this standard because it adds insecurity (what if BIDS changes the standard? what if my atlases cannot be represented with this specification?) at a very low turnover (because here the sharing is with yourself as a developer, you organize the data as it is most convenient for your application). - Upcoming atlases will not adopt this. If an atlas creator wants their template be reused, they either distribute it with the format of a popular tool (e.g., FreeSurfer or AFNI) or it is unlikely to be adopted (except for applications that can query TemplateFlow). - Unfortunately, many template/atlas generators set copyleft and (worse) no-derivs restrictions on the license, which conflict with the purpose of sharing the resource (since these resources are meant to create derived works). That defeats the noble purpose of "sharing" standalone (even if that were a problem). If a derivative is protected with no-derivs (or the raw, like the HCP data), that is within the scope of possibilities. However, `DatasetType` `atlas` allows people to mark a resource as atlas and confusingly set no-derivs (and maybe request royalties after use?). For `derivative` it is not assumed that you can create further derivatives and the license is checked. **Intro of the proposal misses the point**. The introduction of the current proposal is largely devoted to explain what an atlas is. BIDS should not be a neuroimaging handbook, and therefore, BEPs should not require such justifications. I believe this is a consequence of issue 2 to justify the choice.