# Nilearn 2022 CZI EOSSS Application LOI
## Proposal title
Improving standard practice for neuroimaging statistics with Nilearn
## Amount Requested
<!-- Enter total budget amount requested in USD, including indirect costs; this number should be between $100k and $400k total costs over a two-year period -->
We expect to incur a total cost of $220,000 USD over the two year funding period.
Per year, we estimate the following expenses:
- $100,000 USD yearly salary for a dedicated research software engineer based in Montréal, Canada, including indirect costs
<!-- - $10,000 USD for community-focused events and research visits -->
- $7,500 USD for in-person research and development events
- $2,500 USD for virtual community-focused events
## Proposal Summary/Scope of Work
<!-- Provide a short summary of the work being proposed (maximum of 500 words) -->
Human neuroimaging has recently seen important change towards an open, Python-based tooling ecosystem (Poldrack et al., 2018, Annu. Rev. Biomed. Data Sci.), driven in part by community initiatives such as Brainhack (Gau et al., 2021, Neuron).
Key statistical frameworks such as the General Linear Model (GLM), however, have largely remained behind proprietary languages such as MATLAB or in monolithic applications.
As these frameworks inform a significant proportion of neuroimaging research, many researchers remain dependent on these platforms, impairing adoption of more open tools.
Further, this prevents neuroimaging-oriented developers from easily interfacing with or expanding on GLM methods, significantly slowing progress in the overall ecosystem.
Nilearn (http://nilearn.github.io) is a well-established Python package for fast and easy analysis of brain images, with a focus on functional Magnetic Resonance Imaging (fMRI) data.
In 2020, Nilearn expanded to incorporate the GLM and related statistical frameworks.
In the years since, Nilearn's statistical methods have been rapidly adopted by other neuroimaging packages including MNE-Python and FitLins.
Beyond the GLM, Nilearn additionally includes tools for decoding and multivariate pattern analysis, general purpose data fetchers and image-processing routines that are used extensively throughout the community.
We propose to restructure Nilearn's development process to better distribute support across existing functionality, allowing us to stabilize GLM support while creating opportunities to interface with other community tools.
To do so, we will recruit a dedicated research software engineer based in Montreal under the supervision of Dr. Jean-Baptiste Poline, a longtime project collaborator.
<!-- ## what the maintainer would do -->
Nilearn currently has a single, full-time maintainer housed in the Parietal team at Inria, France, Ms. Yasmin Mzayek.
Ms. Mzayek manages active development, long-term planning, and user support for functionality across the whole library, including (1) image processing, (2) decoding, and (3) statistical modelling.
<!-- This broad focus slows project development, preventing iterative interaction with other software packages. -->
Recruiting a new maintainer focussed on statistical modelling would serve to re-structure development workflows and improve the overall health of the project.
The new maintainer would bring this focus into all aspects of Nilearn development including addressing user issues, facilitating code contribution, maintaining or improving infrastructure, and developing didactic examples.
They would also be responsible for interfacing with developers from other community software projects---including FitLins and MNE-Python---to better support their ongoing development.
We expect that this would significantly advance other, related initiatives such as the Brain Imaging Data Structure (BIDS; Gorgolewski et al., 2016, Sci. Data.) extension proposal on statistical models.
<!-- ## hackathons -->
Additional funding will help to organize and promote international community-building events.
Over the past several years, we have organized a series of annual Dev Days that explicitly aim to onboard new contributors to the community.
This year, we expect to more directly integrate these efforts with the Brainhack community through the Organization for Human Brain Mapping (OHBM) annual Brainhack event (https://ohbm.github.io/hackathon2022/).
<!-- These events have proven invaluable, both to engage and educate new contributors on best-practice software development as well as to improve the Nilearn library itself. -->
Financial support would allow us to continue and extend these events, specifically focussing on (1) onboarding new users who may have previously been unable to conduct their neuroimaging research in the Python ecosystem without GLM support and (2) collaborating with other developers who rely on Nilearn's statistical modelling functionality.
## Value to Biomedical Users
<!-- Briefly described the expected value the proposed scope of work will deliver to the biomedical research community (maximum of 250 words) -->
Nilearn has been under continuous development for almost 10 years with more than 800 stars, 450 forks, and 135 contributors on GitHub.
It has achieved this impact both by not only adhering to current best practices in software development but also through its extensive user-focused documentation, which provides pedagogical, step-wise explanations of complex brain imaging analysis.
<!-- This allows new users to learn-by-running these analytic steps. -->
As a result, Nilearn is widely regarded as one of the foundational Python packages in human brain imaging research, often meriting dedicated tutorials at workshops such as NeuroHackademy (https://neurohackademy.org/).
Over the past two years alone, the associated scientific article (Abraham et al., 2014, _Front. Neuroinform._) has been cited 547 times, as indexed by Google Scholar.
<!-- It offers to the biomedical research community an accessible tool that enables reliable and reproducible research. -->
The work proposed here would serve to stablize these efforts while also making the project more directly accessible to researchers who have been unable to previously conduct their GLM-based analyses within the Python neuroimaging ecosystem.
In particular, the proposed scope of work would deliver:
<!-- - New features important to the neuroimaging, neuroinformatics and brain pathology research communities. These include better inspection and interpretation of pipelines for more trustworthy research, and better support for surface data -- meaning that analyzing these data would become easier and less error-prone. -->
- Increasing the range of statistical methods available.
- A more responsive support team to answer questions, help with usage, and fix issues related to statistical modelling.
- More engagement and better awareness of the open-source ecosystem for neuroimaging, through hackathons, tutorials, and improved training and documentation.
- Improved links with existing and emerging packages for statistical modelling with neuroimaging data.
# Open Source Software Projects
<!-- Number of software projects are involved in your proposal (maximum of five): -->
Direct development:
|Software project name | Main code repository | Home page URL |
| -------------------- | ------------------- | ------------- |
| Nilearn | https://github.com/nilearn/nilearn | https://nilearn.github.io
Indirect development:
|Software project name | Main code repository | Home page URL |
| -------------------- | ------------------- | ------------- |
| FitLins | https://github.com/poldracklab/fitlins | https://fitlins.readthedocs.io |
| MNE-Python | https://github.com/mne-tools/mne-python | https://mne.tools |
## Landscape Analysis
<!-- Briefly describe the other software tools (either proprietary or open source) that the audience for this proposal primarily uses. How do the software projects in this proposal compare to these other tools in terms of user base size, usage, and maturity? How do existing tools and the project(s) in this proposal interact? (maximum of 250 words) -->
Nilearn is an important building block of the Python neuroimaging software ecosystem, used by 1,368 repositories and 112 packages on GitHub alone.
These include the popular MNE-Python toolkit for electrophysiology data (1281 citations), Fitlins, an emerging tool for fitting linear models to BIDS datasets, and the neurosynth.org platform (2590 citations).
Traditionally, researchers have relied on GUI or command-line applications, or simple MatLab scripts, for conducting GLM-based analyses.
Examples of such tools are SPM (https://www.fil.ion.ucl.ac.uk/spm), FSL (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki), and AFNI (https://afni.nimh.nih.gov/), which are MATLAB (SPM) or C/C++ (FSL and AFNI) processing applications.
These tools were instrumental in establishing neuroimaging analysis frameworks shortly after the introduction of fMRI; however, they are difficult to extend beyond a predefined set of analyses provided by the application.
Nilearn now provides access to modular implementations of these state-of-the-art statistical methods.
Although other recently developed tools such as BrainStat (https://brainstat.readthedocs.io/en/master/) also provide access to GLM-based modelling in Python, Nilearn is unique in its tight integration with the overall ecosystem and strict adherence to best-practice development workflows.
Specifically, Nilearn has significant interactions with Python libraries for both high- and low-level neuroimaging data processing, including Nibabel (https://nipy.org/nibabel/) and Scikit-Learn (https://scikit-learn.org/).
This positions it to significantly improve the overall experience of researchers who hope to conduct their neuroimaging analyses entirely in Python.
<!-- Moreover, basing analyses on these tools makes it difficult to follow modern practices such as sharing code, using version control, and unit testing. -->
<!-- In the past 10 years, there has been a shift towards a more powerful paradigm where neuroimaging researchers implement their own analyses in Python.
To do so, they leverage high-quality open-source libraries that provide modular implementations of state-of-the-art statistical methods.
Nilearn has been one of the central projects that enabled this change. -->
<!-- old version: -->
<!-- <\!-- ## python ecosystem -\-> -->
<!-- In the past 10 years, the neuroimaging community has been shifting from proprietary, often GUI-based tools to open-source software libraries. -->
<!-- A rich ecosystem of Python packages is now in place. -->
<!-- It is organized around several projects that interact closely, some of which have received grants from the CZI in the past. -->
<!-- Some of the most important Python libraries for neuroimaging are: -->
<!-- - Nibabel: manipulating image file formats. -->
<!-- - Nilearn: statistics, machine learning and visualization. -->
<!-- - MNE-Python: statistics and signal processing for EEG and MEG. -->
<!-- - PyBIDS: organization of neuroimaging datasets in a standard directory layout for interoperability between tools. -->
<!-- These neuroimaging libraries build upon the Python data science ecosystem, including Numpy, Scipy, Scikit-learn and Pandas. -->
<!-- Their maintainers collaborate closely. For example, a joint coding sprint for Nilearn and Nibabel has taken place in May, 2021. -->
## Categories
Machine Learning and Data Analysis, Neuroscience