# fMRIprep LTS
## Oct 13th, 2022
Attendance: YC, TG, PB
* Major update by YC. [Link to google slides](https://docs.google.com/presentation/d/10e35C_tx_q9EyOSe_kaAerTkNvfsqUrWzA80OgpHuGw/edit#slide=id.g1f88252dc4_0_162)
* Ran a bunch of MCA replications of structural processing, on 7 T1-weighted images.
* Derived a test at each voxel -> p value. Also tried applying different FWE and FDR correction for multiple testing.
* Main experiment is "global null", leave-one-MCA-replication-out experiment.
* Varying the alpha threshold, the proportion of (false) positive detections follows the nominal value closely (YAY)
* With FWE correction, most replicates lead to no detection at all, as expected. PB: need to compute the empirical FWE to compare with the nominal value.
* FDR is more variable. PB: try to run same analyses with a bit of smoothing (fwhm = 3?).
* Couple of feasibility experiments have been planned.
* One is to compare the IEEE (baseline) to MCA distribution -> does not detect variations.
* Another: compare the outcome of one image to another image's distribution. Should lead to lots of detections. WIP
* Another: use a corrupted template for registration. Should also lead to lots of detections. WIP.
* Aim is a paper relatively soon. Looks like all the components are in place. The logic is completely generic and could be applied to any image processing pipeline, or really any software pipeline!
* Once the test is established, PB & TG will look into hiring an engineer (internship?) to complete the software library started by LT.
## May 4th 2022
Attendance: YC, LT, TG, YC
* LT is leaving for France :( Effective May 6th
* missing features for the app
* local run
* logging of tests (central stats?)
* have a clean set of datalad image repos in niprep
* in general move everything to nipreps
* Follow-up LT departure
* Maybe hire a Master student to extend the validation?
* Maybe transfer the LTS maintenance to new data platform at criugm - if it gets created
* Tests still don't work
* MCA more libraries (lapack et blas)
* smooth images prior to stats (nilearn's smooth_img)
* enable random seeds in the pipeline
* Idea: use MCA to do data augmentation in fMRI machine learning models
## Apr 13th 2022
Attendance: YC, LT
* Statistic results (https://nbviewer.org/github/yohanchatelain/fmriprep-reproducibility/blob/master/notebooks/stat_test_normal.ipynb)
* 64 MCA iterations of ds000256/sub-CTS201
* Parametric (gaussian)
* Still lot of false alarms (~10%)
* Non-parametric (min-max)
* ~3% false alarms => generate much more samples (20 cpu.year allows 1500 iterations)
## Mar 30th 2022
Attendance: LT, PB
* presentation of the new datalad version control on all test data
* discussion on pros and cons of using datalad run containers to keep track of test execution.
* preliminary consensus: use datalad only to keep track of reference data. Use pytest to run tests and the same mechanism as mriqc to aggregate results across users.
* for reference: API MRIQC https://mriqc.nimh.nih.gov/
## Mar 16th 2022
Attendance: YC, TG, PB, LT, CM
* Storage
* results of the preprocessing generated by YC is saved on compute canada tapes.
* challenge posting the outputs of fuzzy on OSF
* need to remove the workdir.
* LT prepared a datalad repo with bids convention for the archival on OSF.
* We will save only the T1w derivatives for now: one mean and one standard deviation image for each subject.
* How to push a DataLad dataset to OSF
1.
```
OSF_TOKEN=<your_token> datalad create-sibling-osf --title 'BigBrain histogram' \
--mode exportonly \
-s osf-export \
--description "This carefully acquired data will bring science forward" \
--public
```
2.
```
OSF_TOKEN=<your_token> git-annex export HEAD --to osf-export-storage
```
3. Make sure to make the dataset public through the OSF portal
* Compute canada outage
* should be ok transferring on cedar.
* Stats
* In practice, RFT "doesn't work": it is too liberal but we don't really understand why. Instead, the plan is to use a (more experimental) cross-validation approach: estimate the mean and standard deviation of the % of rejected voxels and adjust the p-value threshold from that.
## Feb 2nd 2022
Attendance: YC, TG, PB, LT
* Tolerance interval.
* Voxels aren't independent
* Correct for family-wise error (Bonferroni). But with this we'll pay the price of non-Gaussianity and low # samples. So don't go there yet.
* Don't go with non-parametric test if # of non-Gaussian voxels remains modest. We don't want to impose a computational debt for the package maintainers.
* Other possibility: make averages by ROIs. Using the ROIs produced by Freesurfer. For instance: look at average abs diff between voxels in region. ROIs could be considered independent.
* Do we want to use 95% as absolute interval? Where do we put the cut-off?
* -> check test by ROI?
* Other sanity checks:
* Generate more fuzzy samples (30)
* Use cases
* Software update
* Deployment on different hardware
* Simulate a template corruption (ask Basile for corrupted template or how to replicate corruption)
* Functional data: use Hao-Ting's nilearn code to process the images and design the test on this result (don't introduce noise in this analysis)
## Jan 10th 2022
Attendance: YC, LT
* statistics:
* normality testing
* mask for non-gaussian voxels
* non-parametric test for 5 samples ?
* fuzzy status:
* now work on fMRI data! Issue was that fsl-bet caught stderr output from verificarlo as errors.
* reference:
* update on make-reference (raw fmriprep outputs, bids anat mean/std)
* what alse to include in reference (func corr mean/std, masks for gaussian distributed voxels)
## Dec 21st 2021
Notes by TG and YC on next steps following our HBM abstract submission:
# Error maps
* Get error maps for the functional pipeline. Status: we have 5 fuzzy samples for 1 subject.
* Curiosity: check if error map resembles local SNR
* Trace the pipeline with Pytracer to understand where the error comes from
# Test definition
* Check if fuzzy samples are Gaussian (they're probably not)
* If they are not, find a way to define confidence intervals from non-Gaussian samples
* Sanity check: check that new fuzzy samples pass the test
# Experiments with test
* Does the test pass if random seed isn't fixed?
* Does the test pass if multithreading is enabled?
* Does the test pass if dependencies are updated?
# Other
* Clean GH repo and explain how users can test their pipeline
## Dec 15th 2021
Attendance: YC, TG, CM, LT
* Fuzzy outputs:
* Anatomical outputs instabilities mostly on the borders
* Error maps are available for the anatomical pipeline. 8 subjets.
* Tests will be implemented for 3 derivatives (native space, 2 versions of MNI template) x 8 subjets
* Mask subcortical gray matter regions of the MNI template
* https://templateflow.s3.amazonaws.com/tpl-MNI152NLin6Asym/tpl-MNI152NLin6Asym_res-06_atlas-HCP_dseg.nii.gz
* Overall we keep half of the precision
* For HBM
* TBD push outputs to osf and create datalad repo
* OHBM abstract:
* https://docs.google.com/document/d/1qE2W3qBhf_MZKto_Ywlg93Sesn5aMNAze45fU8J44V0/edit
* TBD fuzzy with numpy update (if available before deadline)
* TBD dataset descriptions
* Statistical reports:
* Relative differences (for anatomical) and pearson correlation (for fMRI)
* We want instead absolute comparison with fuzzy bounds for each voxel
* Current work can be used for fuzzy func ref generation
## Nov 24th 2021
Attendance: YC, LT
* YC:
* run fuzzy-container for anatomic only
* with IEEE => 100% reproducible with sequential mode (what was expected)
* with MCA => big difference related to masking (max std ~10^9 voxel-wise)
* next step to run for functionnal
* explanation on reproducibility error multithreading VS multiprocess:
* multiprocess is on independent task parralelism where multithreading happens at more fine-grained level (@markiewicz and @mathias you confirm ?)
* LT:
* big errors must come from registration: voxel inside mask VS outside mask (from max std)
* will create anat only slurm file for YC to experiment within our tool
* statistical test still on devloppment
## Nov 10th 2021
Attendance: PB, YC, LT
* LT
* has an app working for fuzzy (as fuzzy does not work yet, it's using multiple runs of IEEE). Quantifies correlation of time series across all pairs of replications, averaged in a brain mask, as well as html reports showing the results voxel by voxel.
* PB
* acquiring a better test dataset 100% feasible. We need a way to release them publicly. WIP (high priority because the same issue applies to cneuromod).
* re brainhack global, still unclear, will revisit on Nov 24th.
* YC
* test on anat only. Ran IEEE inside fuzzy, with 8 participants. Big variability in run length - one subject takes 5 days (!!).
* next step to include MCA.
* Then check anat+func.
## Oct 25th 2021
Attendance: PB, YC, LT
* LT: update on the fmriprep fMRI repro test
* fuzzy is broken atm. Yohan is going to look into it.
* simple reproducibility metrics -> PB and LT to draft.
* structural
* dice between reference / new brain mask
* max absolute difference relative to reference intensity in the reference mask
* functional
* dice between reference / new brain mask
* minimum correlation of activity at a voxel between test and retest in the reference mask
* timeline: have a functional app to test inter-os repro in the coming weeks. For fuzzy, unclear, because it's unclear what the problem is.
* move the app from simexp to nipreps?
* YC: plans for identifying reproducibility bottlenecks.
* To be discussed at a later point. We need the app and assess the magnitude of the problem first.
* PB: plans for a brainhack project on fmriprep_reproducibility.
* TBD. Depends if we have a working app soon. Will decide on our meetings scheduled Nov 24th.
## June 29 2021 (2d round)
Attendance: Loïc Tetrel, Chris Markiewicz
* CM: fmriprep team can also work on (github actions) tests
* LT: CI tests may not be ideal for a datalad repo environment?
* CM: It should accomodate the load (fmriprep running 10G containers)
* LT: fuzzy issues
* CM: `_fix_surfs7` - collecting precomputed outputs
* from https://fmriprep.org/en/1.1.4/_modules/fmriprep/interfaces/surf.html
* CM: you would better try re-running `bet` inside the fuzzy environment
## June 29 2021 (1st round)
Attendance: Loïc Tetrel, Yohan Chatelain, Tristan Glatard
* LT: reorganized tests to reduce scan time in subjects
* LT: parametrized tests
* LT: created a Makefile
* LT/TG: we could publish test results to (another) git repo
* TG: will investigate why fuzzy bet crashes
```
RuntimeError: Command:
bet /WORK/fmriprep_work/fmriprep_wf/single_subject_CTS201_wf/func_preproc_task_restbaseline_wf/initial_boldref_wf/enhance_and_skullstrip_bold_wf/n4_correct/ref_bold_corrected.nii.gz /WORK/fmriprep_work/fmriprep_wf/single_subject_CTS201_wf
/func_preproc_task_restbaseline_wf/initial_boldref_wf/enhance_and_skullstrip_bold_wf/skullstrip_first_pass/ref_bold_corrected_brain.nii.gz -f 0.20 -m
```
*
## June 15 2021
Attendance: Loïc Tetrel, Chris Markiewicz, Tristan Glatard, Mathias Goncalves, Yohan Chatelain
* LT: compared reproducibility with single-threading vs multi-threading vs multi-processing - controlling otherwise for random seeding. Multi-processing is completely reproducible (YAY), but multi-threading is not. MG: ANTS is probably to blame. See https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* LT: still trouble running the fuzzy fmriprep. Next in line.
* PB: still need to make the code modular for the evaluation, and change the evaluation metrics.
* LT: shows first implementation of the test using pytest. CM suggests looking at https://docs.pytest.org/en/6.2.x/parametrize.html
## June 8 2021
Attendance: Loïc Tetrel, Chris Markiewicz, Yohan Chatelain, Tristan Glatard, Mathias Goncalves
* TG: will make a PR on shellcheck nitpicks in the bash scripts. Nothing major.
* LT: improved submission scripts
* LT: using options `--random-seed 1234 --fs-no-reconall --anat-only --skull-strip-fixed-seed --nthreads 1 --omp-nthreads 1`, results are exactly reproducible for both the anatomical and functional pipelines
* CM: `--skull-strip-fixed-seed` passes `--use-random-seed 1` to ANTs Atropos... cannot specify the seed directly
* LT: still running fuzzy (5 repetitions)
* How to show unchanged images?
* CM: GIF is lossy; niworkflows uses SVG ([link](https://github.com/nipreps/niworkflows/blob/2600e4fe18012a852a9169bfac5804d3ad789eba/niworkflows/interfaces/reportlets/base.py#L89-L110))
* TODO: compare multithreading implementation of fmriprep with fuzzy
* `omp-nthreads` is likely to affect the results but `nthreads` should not
* Tests: we will implement both a global and a voxelwise one. LT to implement a first version.
## June 1 2021
Attendance: Pierre Bellec, Yohan Chatelain, Loïc Tetrel, Mathias Goncalves, Tristan Glatard
Regrets: Chris Markiewicz
* testing reproducibility of the anat pipeline.
* LT: differences across runs relates to seeding the skull stripping step.
* LT: experiment will be done, now that the latest singularity images are ready
* TG: Singularity images were updated to use fmrirep 20.2.1. See https://github.com/SIMEXP/fmriprep-lts/pull/3
* TG: is each repetition really independent?
* LT: yes it should
* MG: you could check in the logs.
* LT: I have, and it restarts from scratch.
* Update on the container script
* BP suggests to try to control versions in these build files:
* https://github.com/nipreps/fmriprep/blob/7e17deaf05a27c577a8cfb0099b93c9883cc63ce/Dockerfile#L9-L18
* https://github.com/nipreps/fmriprep/blob/7e17deaf05a27c577a8cfb0099b93c9883cc63ce/Dockerfile#L73-L80
* PB: we will build a reference set of results from the existing container, then patch the container dockerfile, rebuild and confirm that the results don't change.
* possible solution ND freeze https://neuro.debian.net/pkgs/neurodebian-freeze.html Maybe overkill.
* after some debate, candidate solution would look like:
* add more control on package versioning.
* aim to rebuild periodically with newer packages (including ubuntu).
* systematically test for changes.
* what images and metrics?
* TG: mean and standard deviation
* PB for 4D: voxelwise correlation between time series, or percentage-of-baseline difference.
* LT: how do we test departure from reference distributions.
* we'll implement both a voxel-wise and volume-wise approach.
* in terms of images:
* preprocessed T1 and mask, as well as preprocessed fmri.
TODO:
* TG to have a look at the bash script
* https://github.com/SIMEXP/fmriprep-lts/blob/4e983f7cb5914bf336c54942ccc1663cb3b484e1/code/run.bash#L88-L109
* discuss (and vote?) the solution for maintaining the lts container
* contact BP for acquiring new test data
NEXTTIME:
* report
* how to make code cleaner and modular
* what outputs for the report
* report or reportlets?
## May 25th 2021
Attendance: Pierre Bellec, Chris Markiewicz, Tristan Glatard, Mathias Goncalves, Loic Tetrel
* TG did create PR for building a singularity container for lts with and without fuzzy:
* Building the container: https://github.com/SIMEXP/fmriprep-lts/tree/master/code/containers
* Container images: https://github.com/SIMEXP/fmriprep-lts/tree/master/envs
* https://github.com/SIMEXP/fmriprep-lts/pull/1 containers are stored on OSF.
* Usage: https://github.com/SIMEXP/fmriprep-lts
* -> should pull 20.2.1 instead of 20.2.0
* LT is giving an overview of https://github.com/SIMEXP/fmriprep-lts
* PB Q: where should all these repo live?
* Where to put fmriprep-preprocessed data
* PB: OSF
* CJM: Another option: gin.g-node.org
* CJM: BIDS-style outputs: [`--output-layout bids`](https://fmriprep.org/en/stable/usage.html#Other%20options) (see [nipreps/fmriprep#2303](https://github.com/nipreps/fmriprep/pull/2303), [neurostars #18492](https://neurostars.org/t/sharing-nested-bids-raw-derivatives-in-a-datalad-yoda-way/18492))
* PB: need to decide how to name/organize the different runs
* MG: would be useful to have more slurm options exposed.
* CJM: reproman would be the way to implement a generic tool
* https://github.com/ReproNim/reproman
* TG: maybe decouple the data processing from the test, to better deal with crashes.
* LT: is it ok to develop our code base for linux only? or is there some form of windows and osx support for fmriprep? CJM: bash is a reasonable requirements.
* TG: we should use a testing framework to implement the tests, for instance:
* pytest if Python
* [bats](https://bats-core.readthedocs.io/en/latest/index.html#) if bash
* fmriprep's testing framework (link?)
* Next steps
* investigate discrepancies on the anatomical report
* add CJM to compute canada -> PB to send directions
* CJM to investigate the environment variables of the different jobs.
* implementation of the report
* TG and Ali to review the bash scripts
* also compare fmriprep command line arguments used in Ali and TL's experiments
* Next week
* discuss which images to look at in the tests
* discuss which metrics to quantify variations in the test
* possibly Basile Pinsard update on the container build
## test data with NMIND
**Attendance**: Greg Kiar, Pierre Bellec, Audrey Houghton, Xinhui, Yohan Chatelain, Tristan Glatard, Loic Tetrel, Mathias, Steve Giavasis, Sydney Covitz
* GK: check this issue: https://github.com/nmind/hackathon2021/issues/4
* GK also check this issue: https://github.com/nmind/hackathon2021/discussions/16
* AH: selecting some subjects for ABCD and NKI Rockland. A next step is to run fmriprep on these data, and check changes on differences. Those are quality subjects.
* SC: Q: are there tests done with T1+FLAIR. A: Mathias - no.
* GK: how would you test the LTS for low quality data. PB: our aim is to test if an install is proper, not that the pipeline is robust. So a single subject should be enough.
* PB: for test set: partial field of view (ICBM aging F1000). Hyper-intensity (oulu adhd200). atrophied brains (adni). multi-scanner (simon).
* Mathias: pretty sure the FLAIR is only used with the surface reconstruction so shouldn't effect the fmriprep pipeline much
* would be useful to have infants for a project like https://github.com/nipreps/nibabies
* LT: would you need public data for evaluating data? GK: not necessary. You can have a centralized evaluation of the tests, or a partial evaluation just with the public data.
* PB: what about having a very small dataset to run quick tests? GK: ultimately this type of test is quite different from processing real data. PB: it would be useful only to catch crashes and changes in outputs. GK: not on scope at the moment.
## May 11th, 2021: update and test data
### varia (container image hosting and generation)
**Attendance**: Chris Markiewicz, Mathias goncalves, Ali Salari, Loic Tetrel, Pierre Bellec, Tristan Glatard (late)
* LT: where are we at with the fuzzy container repo? AS: no progress yet. Next week. CM: singularity hub is closing down. Consensus: stand-alone datalad repo, put the singularity images on OSF (which has a datalad remote).
* CM: may be worth saving the apt and pip cache and archive that for the build of the LTS. nd_freeze? https://neuro.debian.net/pkgs/neurodebian-freeze.html Talking to Yarik?
* CM: Looking through our dockerfile it does look like we separated the Python from non-python packages pretty well, so Greg's suggestion seems like a reasonable fallback.
### test data selection
* LT: rationale for selecting data
* children, young adults, older adults
* multiparametric (T2w check, FLAIR check, could not find SBREF... BP -> https://openneuro.org/datasets/ds001399/versions/1.0.1 or CM mentions ds000031, ds000244, ds001178, ds001399, ds001417, ds001734, ds001740, ds001771, ds001818, ds001978, ds002147, ds002278,ds002316)
* fieldmap (all different)
* MG: multi-echo
* BP: every vendors?
* CM: lesion masks.
* CM: create one for one subject, with absolutely everything.
* PB: three use cases: (1) test all aspects of the pipeline on one subject; (2) test the pipeline on all kinds of different subjects. (3) same as 1, but tiny for continuous integration. CM: may need to have dedicated test infrastructure.
* TODO: put together some docs on the test data.
## May 5th, 2021: preliminary results and planning
**Attendance**: Tristan Glatard, Yohan Chatelain, Greg Kiar, Ali Salari, Loic Tetrel, Pierre Bellec
**Notes**:
* [neurostars thread on cross-run differences](https://neurostars.org/t/differences-between-fmriprep-runs-on-same-data-what-causes-them/18543/3)
* LT: investigated reprodudicibility
* selected some data on open neuro
* implemented a pure functional workflow with a fix anatomical workflow.
* also implemented pure anatomical
* running without MCA, but is possible with MCA
* TG:
* Ali focused on the anatomical pipeline
* re-using options in the neurostars post: `fmriprep --random-seed 1234 --fs-no-reconall --anat-only --skull-strip-fixed-seed --omp-nthreads 1`
* perfectly reproducible results with fixed seed and no MCA perturbations
* it was run on local cluster
* Used one session of the SIMON dataset
* Evaluated stability via MCA on single subject (low sig. digs: ~3)
* low significance appears to be due to subtle (but impactful) registration differences
* still some small impact of MCA
* see results here: https://github.com/glatard/fuzzy-fmriprep/blob/main/sigdigits.ipynb
* LT:
* present his project https://github.com/SIMEXP/fmriprep-lts
* TG:
* we should have a pass/fail test
* users need a single run
* GK:
* it is going to be hard to rebuild the same container
* this test could be integrated for developers in CI
* PB: this is a great idea, but would require different data for different use cases
* TG: CI could just grab outputs. Let's focus on the tests first, leave CI for later.
* GK: initiative of Damian Fair, Ted S and Mike M to avoid duplications. Will create a test set. Greg will be our champion so far. NMIND (this Neuroimaging Method Is Not Duplicated)
* TG: should it be part of standard fmriprep tests? PB: no, too heavy.
* GK:
* fmriprep already uses Sentry to integrate some basic testing/logging: https://sentry.io/welcome/
* We could bake in the test for comparing results to the stored MCA-derived mean/variance estimates
* Would be a great way to integrate our tests to the fmriprep existing environment
* PB: how can we find parts of the pipeline which are more variable?
* GK: we could replace inputs of each step by pre-generated ones.
* PB: this would be awesome, but requires lots of data.
* GK: could be done for developers.
* TG: other tracing methods could be used.
* TG: maybe we should focus on the simple test comparing with final outputs.
* PB
* good test sets (GK)
* decide target and performance measure for a regression test
* next decide on structures for the repos
* next decide on structures for the reports and tests
* other issue is to track cleanly our containers
* GK suggests using staged container build
* TG and team to build a datalad repo with fuzzy fmriprep image.
* https://handbook.datalad.org/en/latest/basics/101-127-yoda.html
---
## April 21th 2021: discussion on the origin of variations in fmriprep
### Attendance
Tristan Glatard, Yohan Chatelain, Chris Markiewicz, Greg Kiar, Ali Salari, Mathias Goncalves, Loic Tetrel, Pierre Bellec
### Agenda
* round-table discussion.
### Minutes
LT: we should focus on the tool to quantify instabilities.
TG: if there are random variations even when fixing the seed, then adding numerical noise is going to get drowned in those variations.
CM: Good first step would be to do a full run, and then re-run each step one by one with fixed seed to narrow where variations occur.
PB: maybe first step would be to re-run it with fixed seed, and also try to control for the anatomical variability.
PB+TG: metrics number of digits? correlation? differences and variance?
GK: dataset? HNU1, look at different CoRR sites.
PB: provided we identify sources of instabilities related to seeds, what do we do about it?
CM: some variations may be reasonable.
GK: would need to assess if the tool sometimes diverges, or if it's "expected" variability.
CM: could try to optimize the registration.
TG: boostrap averaging?
TG: compare solutions.
PB: consider parcel-based stability metrics as well?
CM: smooth data? What smoothing level is required to get to something acceptable?
PB: who would like to be directly involved in this work:
* TG and YC
### TODO
[x] create a mattermost channel for the project
[ ] understand better reproducibility with fixed seed (with / without fixing T1 processing)
[ ] decide on data & metric