owned this note
owned this note
Published
Linked with GitHub
# SALib: Sensitivity Analysis Tools for Biomedical Research
Letter of intention: https://hackmd.io/@tupui/sailb-czi/edit
GDrive: https://drive.google.com/drive/folders/1QqGAVU-LGS5uaARKNk926TNYQXDJGyRx?usp=sharing
## Meetings
### 2023-11-07 (https://meet.google.com/ivk-ihit-bwe)
#### Agenda
* Hi :wave:
* Our availability, work appetite and :heavy_dollar_sign:
* Takuya: Code, doc
* Will: DEI, supervision role, researcher involved for some tasks
* Pamphile: Code, involve people like Andrea
* John:
* Personal goals for the CZI
* Go back on the LOI and (attempt) list+assign tasks
#### Action points
- Will draft Diversity and Inclusivity statement (21st November)
- Pamphile - develop first draft of milestones and work programme (21st Nov)
- Takuya - will write some text for the proposal (21st Nov)
## Risk management
- architecture does not allow or enable feature development
- insufficient engagement with biomedical community
- difficulty in hiring staff
## Full proposal
### Contributors with ORCID
Will Usher
KTH Royal Institue of Technology
[0000-0001-9367-1791](https://orcid.org/0000-0001-9367-1791)
Pamphile Roy
[0000-0001-9816-1416](https://orcid.org/0000-0001-9816-1416)
Takuya Iwanaga
[0000-0001-8173-0870](https://orcid.org/0000-0001-8173-0870)
### Other items spotted (by Will in rfa)
[Link to Detailed Instructions](https://chanzuckerberg.com/wp-content/uploads/2023/08/EOSS-6__Essential_Open_Source_Software_for_Science_Cycle_6_Application_Instructions_-_LOI__Final.pdf)
> Proposal Title: Auto-filled; Maximum of 60 characters, including spaces. If you need to edit your proposal title, navigate to your application summary page; click on the three dots to the right of the application title; and select Rename from the dropdown menu. Please note that you will not be able to make changes to the title of your application between the LOI and full proposal period.
> Amount Requested: Total budget amount requested in USD, including indirect costs; this number should be between $100,000 USD and $400,000 USD total costs over a two-year period. Enter whole numbers only (no dollar signs, commas, or cents).
> Proposal Summary/Scope of Work: Provide a short summary of the work being proposed (maximum of 500 words).
> Value to Biomedical Users: Describe the expected value of the proposed work to the biomedical research community (maximum of 250 words).
> Open Source Software Projects: Indicate the number of software projects involved in your proposal (up to five). Complete the table with the following information for each software project.
>
> - Software project name
> - Main code repository (e.g. GitHub URL), enter in format https://www.example.com.
> - Homepage URL (if none, re-enter the main code repository URL), enter in format https://www.example.com.
>
> Landscape Analysis: Briefly describe the other software tools (either proprietary or open source) that the audience for this proposal primarily uses. How do the software project(s) in this proposal compare to these other tools in terms of user base size, usage, and maturity? How do existing tools and the project(s) in this proposal interact? (maximum of 250 words)
>
> Category: Choose the two categories that best describe the software project(s) audience:
> - Bioinformatics
> - Single-cell biology
> - Structural biology
> - Clinical research
> - Genomics
> - Neuroscience
> - Infectious disease
> - Imaging
> - Data management and workflows
> - Machine learning and data analysis
> - Visualization
### Proposal Purpose
> Describe the purpose of the proposal in one sentence (maximum of 200 characters including spaces). Example: To develop a comprehensive, validated atlas of the human kidney at single-cell resolution open to the entire scientific and clinical community.
Update the SALib package to provide deeper insights into models and tools for the biomedical community
### Work Plan
> A description of the proposed work for which funding is being requested, including resources the applicants will provide that are not part of the requested funding. For software development-related work (e.g., engineering, product design, user research), specify how the work fits into the existing software project roadmap. For community outreach related activities (e.g., sprints, training), specify how these activities will be organized, the target audience, and expected outcomes (maximum of 750 words)
> [name=Takuya Iwanaga Q: How do the above relate to the needs of the biomedical community?]
> Need to find some info or identify some framing so a good story can be crafted.
To achieve our goal, we request support for these four areas: (1) Develop a new coherent API, (2) Develop a framework for uncertainty visualization, (3) General maintenance, and (4) outreach.
1) Develop a new coherent API:
SALib, the Python library for sensitivity analysis, has grown organically into a robust set of functions with an Object-Oriented framework, supporting end-to-end analyses for researchers. While serving its purpose well, the current architecture poses a growing maintenance burden. To address this and cater to the evolving needs of the biomedical community, we propose allocating funding to develop a new, flexible API for SALib.
Recognizing the increasing complexity of SALib and its role in diverse scientific domains, a new API is crucial. This effort would allow SALib to align itself with recent initiatives to standardize core scientific packages, for example via the Python Array API standard. Doing so would support interoperability and compatibility with current and upcoming technologies allowing researchers to seamlessly leverage GPUs and multi-processing capabilities for larger-scale analyses. The objective is to provide a more adaptive and maintainable framework while supporting intergration with other standardized tools.
2) Develop a framework for uncertainty visualization:
SALib currently provides plotting features designed to enable researchers to quickly visualize results of uncertainty and sensitivity analysis. While these visualizations enable a rapid research workflow, they are not intended to be of publication quality to the standard expected in the biomedical community. To better support the full research workflow from end-to-end, we propose allocating funding to improve SALib’s visualization capabilities, particularly in higher-dimensional contexts. The goal is to provide researchers with tools that not only facilitate rapid assessments but also produce publication-quality visualizations. This enhancement aligns with broader efforts to standardize visualization across scientific packages and ensures that SALib contributes to a seamless and comprehensive research experience. Improved visualizations also bolster SALib's secondary role as a platform for teaching and communicating the role uncertainty and sensitivity analysis plays in modern modeling workflows.
3) General maintenance:
Aside from working on reducing the backlog of issues and open pull requests, we will:
* Contributing - Adding documentation and processes on how the project work and how to contribute is paramount for the project to grow and attract new developers.
* Documentation - Add necessary tools to allow an interactive documentation.
* Tests - Add a framework for hypothesis testing to strengthen existing tests and ensure the new API is robust.
> [name=Takuya Iwanaga @WUsher - any chance we can devote some funding to support a student to get involved in maintenance? I/we could probably find someone from the community at large as well]
>
> [name=Will Usher - yes, potentially, but the overhead "problem" I highlighted in my e-mail might come into play]
> [name=Takuya Iwanaga I've left the above for reference but my attempt is below. However, maybe I'm making it too wordy judging by the SciPy proposal, but I'm trying to fill the indicated word limit...]
> [name=Will Usher @Takuya We don't yet provide issue or pull request templates. Adding several different issue templates could really help with improving the quality of bug reports or questions.]
Aside from working on reducing the backlog of issues and open pull requests, we aim to further refine the contribution process by establishing clear guidelines such as issue and pull-request templates and improving documentation. Clear contribution guidelines ensure a smooth onboarding process for new developers, fostering a more engaging community and ensuring project longevity. Quality of the documentation is also critical. To this end we intend to implement interactive documentation to provide a more engaging understanding of SALib capabilities. We also would devote energy towards updating and expanding the suite of tests we currently have, adding a framework for hypothesis testing to ensure the new API (proposed above) is robust and reliable.
4) Outreach:
* Sprints, Tutorials, and Talks – We will host in-person sprints, tutorials, and talks at EuroSciPy2025 and other conferences to support users and reinforce the relationships between SALib and the Scientific Python Ecosystem.
* Office Hours – To provide a more direct link between SALib learners and experts, the applicants will organize SALib "Office Hours" on the Sensitivity Analysis Discord server. This initiative will be advertised on the SALib website, social media, and we will specifically invite biomedical researchers and software maintainers that have responded to our surveys.
* Direct Communication – To develop a reference group of individuals working across biomedical sciences, modelling and uncertainty/global sensitivity analysis, who will provide the development team with user stories, requirements and feedback on the package.
* Documentation – To foster uptake within the biomedical community, we will develop a range of practical, real-world, examples from the biomedical literature to demonstrate the benefits and advantages of uncertainty/sensitivity analysis.
* Collaboration - We will organize a seminar at the Lappeenranta University of Technology where the team have good connections. This will be the first time the team meet together. We will be joined by renown researcher in SA like Andrea Saltelli.
The identified team consist of the most knowledgeable and active maintainers of SALib–with commit rights. Moreover, Dr. Roy is one of the most active maintainers of SciPy (by numbers of commits and his activity in the community), which enables a good collaboration between the 2 projects.
### Milestones and Deliverables:
> List expected milestones and deliverables, and their expected timeline. Be specific and include where possible any goals for metrics the software project(s) are expected to reach upon completion of the grant. Please use a third-person voice (maximum of 500 words).
For simplicity, the start of the project is represented as Q1 of year 1 (Y1). The level of effort (LoE) was evaluated independently by the team until consensus was reached. The planning outlined below assume the following average LoE:
Dr. Roy: 16h/week
Dr. Iwanaga: 8h/week
Dr. Usher: 8h/week
Dr. Herman: advisory role
Dr. Sahin: advisory role
Outreach activities as well as general maintenance will be on-going activities throughout the entire duration of the project. As new features will be developed, the team will strive to engage with the community and also leverage existing relationships and networks.
The API work will span from Q2-Y1 to Q1-Y2 providing necessary time for an evaluation of needs, the implementation of several proof of concepts (POC) and a review period.
During the POC phase, the team will begin researching visualization techniques suitable for the new API. This work will span from Q3-Y1 to Q4-Y2.
See the attached gant chart for an overview of our planning.
> See https://excalidraw.com/#room=9fa6656b02a84788cb8d,7BvNKLzmXbarqI3RH6tTAA
The metric for developing a new framework for the API and the visualization will be two-fold. First, the publication and communication of specifications for both components to the Sensitivity Analysis community and second, our implementation of these specifications. Furthermore, if our lead with the Napari community is fruitful, part of this work would be integrated as a plugin. An optional deliberable, would be the publication of a Sensitivity Analysis dashboard. This would be a powerful tool for dissemination.
The metric for maintenance work will be the increased test coverage of the API, the adoption of new tooling (better CI, interactive documentation, hypothesis testing) and processes–such as the Array API standard. The team will also meet (in Europe) for the first time and make focus work.
The metrics for dissemination of results are adding 5 biomedical examples or a section to our documentation, the publication of 2 blog posts on the Scientific Python Ecosystem, hosting 10 office hours and 2 seminars on Discord, participating in 1 international conferences (specifically SciPy2025, EuroSciPy2025), make a seminar at LUT in 2025, and participating to all CZI meetings.
Progress toward these deliverables will be tracked using GitHub Projects and shared in the reports.
### Existing Support
> List active and recently completed (previous two calendar years) financial or in-kind support for the software project(s), including duration, total costs in USD, and source of funding. Include any previous funding for these software projects received from CZI, Wellcome, and/or Kavli outside of the EOSS program (maximum of 250 words).
SALib has not received any direct financial support in the last two years. The development effort from maintainers and the SALib community solely relies on volunteers willing to help the project during their free time.
> Many contributions are from academia, so indirectly funded through various research grants, although the small nature of the contributions make it very difficult to track the funding behind these contributions.
> [name=Takuya Iwanaga @WUsher could you check the above statement? Is your involvement technically a form of in-kind support?]
> [name=Pamphile ROY] if there was no clear budget associated to SALib then we should not put anything.
In the last 2 years SciPy was awarded two grants from the CZI EOSS program for the following projects: SciPy and collaborating projects NumPy, Matplotlib, and pandas (i) "Advancing an Inclusive Culture in the Scientific Python Ecosystem" (EOSS-DI-0000000031)-$400k, ended in 2022; (ii) "SciPy: Fundamental Tools for Biomedical Research" (EOSS5-0000000176)–will end in November 2024. Dr. Roy's involvment has already been reduced due to Dr. Roy leaving Quansight and his involvment in this proposal will stop should this proposal be funded. The present proposal is related to the last project (ii) as it funded Dr. Roy time to add the function `scipy.stats.sobol_indices`. This work was done in collaboration with other SALib maintainers. The present proposal will build on this work and ensure that the new API will integrate properly with SciPy's new capabilities.
SciPy and collaborating projects NumPy, pandas, and scikit-learn were awarded $1.383M for the NASA ROSES-2020 project “Reinforcing the Foundations of Scientific Python”. The project timeline is from January 2022 – January 2025.
SciPy has been awarded several short-term Small Development Grants from NumFOCUS:
Streamlined Special Function Development in SciPy - $10000 - 2023
Faster Random Variate Sampling from SciPy Statistical Distributions – $9000 – 2022
Introducing Users to Powerful New Features of SciPy – $5000 – 2022
SciPy has received $2500 per month from Tidelift since November 2021. This funding is slated for relatively small maintenance projects.
### DEI
As highlighted in the outreach section, SALib aims to present an inclusive and welcoming environment to users. As a group, we share the value that an inclusive and welcoming atmosphere is essential for excellent work so that norms and conventions, particularly around gender, can be challenged, and prejudices dispelled.
Despite its roots in engineering and the environment, the software is now used across multiple disciplines, and by many different types of institutions. It is therefore fundamentally important that all feel welcome to interact with the development team and other users. One way in which this is encouraged is through the Code of Conduct (Contributor Covenant) which has been in place since the launch of the package. Will Usher is the contact point. Detailed contribution guidelines are also in place.
The Github issue tracker has evolved into a very useful history of questions and answers. The core maintenance team quickly provide answers and guidance to the wide range of users, from beginners through to experts.
The qualitative methods used in the project (interviews, workshops or surveys) will incorporate monitoring processes to minimise the bias that could occur if the sample exhibited e.g. a gender imbalance. For example, gender could play a role in explaining differences in the acceptance of of the proposed solutions.
We will also be careful to incorporate balanced and representative messaging in social media and communications posts, for example avoiding excessive promotion of individuals, and also ensuring that group photos accurately represent workshop participation, and provide messaging that is inclusive and celebrates diversity.
### Budget
> here https://docs.google.com/document/d/1oxDjtvZ18L7J3MJbzOsUj7oEuf7CJjslbCP3FKlXJNc/edit?usp=sharing
| | **Year 1** | **Year 2** | **Project** |
|:-------------------------- |:----------:|:----------:|:-----------:|
| Pamphile Roy | $73320 | $73320 | $146640 |
| Takuya Iwanaga | $44720 | $44720 | $89440 |
| William Usher | $44720 | $44720 | $89440 |
| Jon Herman | $0 | $0 | $0 |
| Abdullah Sahin | $0 | $0 | $0 |
| Meeting and EuroSciPy | $9250 | $10250 | $19500 |
| Consulting Manao IDC (15%) | $25802 | $25952 | $51754 |
| **Total** | $197812 | $198962 | $396774 |
Consulting Manao GmbH will be the primary recipient of the grant and will hire for work Dr. Roy, Dr. Iwanaga and Dr. Usher. As the team's fiscal sponsor, $51444 is allocated for recovery of indirect costs.
Both Dr. Herman and Dr. Sahin are participating in this project but are not going to be compensated for their work through this grant.
$166400 compensates Dr. Roy for 1664h of work. $83200 compensates Dr. Iwanaga for 832h of work. $83200 compensates Dr. Usher for 832h of work.
$9250 allow the team to make a seminar in EU and meet for a week: $5500 for the flights (assuming $500 for European flights and $1500 for international ones) and $3750 for the accomodation (assuming $150/night for 5 nights). The team will encure additional costs and otherwise use the remaining funds to work on the project.
$10250 allow the team to attend EuroSciPy2025 and meet for a week: $1000 for the conference tickets (assuming last year's price of $200), $5500 for the flights (assuming $500 for European flights and $1500 for international ones) and $3750 for the accomodation (assuming $150/night for 5 nights). The team will encure additional costs and otherwise use the remaining funds to work on the project. The participation to US based conferences (SciPy2024 and SciPy2025) will be assumed by the applicants and their organizations.
All equipment, facilities, supplies, and publications are provided by the applicants and their organizations.