# Lab Docathon Notes
# 2024/11/12 Docathon Notes
11/12/24: important ideas
Ted: admin stuff
Tien: data/datalad
Joelle: projects, common methods, workflows local vs. HPC
Kev: CUBIC, pFNS
Audrey: CUBIC
MC: CUBIC
Golia: CUBIC
Amelie: CUBIC, onboarding
Laura: onboarding
Brooke: onboarding, resources
Steven: project setup, SLURM, conda, onboarding
Juliette: PAFIN
Lizzie: github
Parker: slack hygiene
Taylor: website formatting, processing methods
### things from last time:
- update projects
- Containers
- archive PMACS/FW
### feedback from lab survey:
- slack
- biog sib
- social events
- work/life
- space
- lab meetings
### admin team tasks
- onboarding
-
### data team tasks
#### CUBIC
- need to fix some typos in "Mounting CUBIC on your local machine using FUSE"
- numbering issues under "Method III"
- take out job submission section.
- add best practices for slurm and link to CUBIC slurm docs (their stuff is pretty good)
- what are the most important directives for us?
- links to examples: new cubic wiki
- aud's example code (will eventually be updated with clean project github)
- miniforge docs
- workflow for what to do on CUBIC vs what to do locally
# 2024/02/27 Docathon Notes
## Google Doc list
- CONCUR – out of date!!!
- Onboarding – update
- Expectation
- Redundant sections –
- Best practices for coding – in computation not basics
- VS code in 2 places – also doesn’t work!!!
- CUBIC
- how long should it take (?)
- process /workflow
- remove SGE→ SLURM
- norms on “how much is too much” – refer to cubic docs
- look at current docs (!)
- VSCODE
- linking to github – ask cubic admins
- List of powerful, well-documented examples for projects
- Project list – out of date
- PMCAS→ move to archive
- Flywheel → move to archive
- Github instructions for ssh keys, tokens, linking to CUBIC
- Wishlist of common methods
- Slurm array jobs
- spin tests
- GAMs + GAMMS
- PLS
- Neuromaps
- Ridge regression
- PFNs
- Haufe Transforms
- opNMF
- Background info /orientation
- neuro– andy’s brain book
- coding
- Using containers – needs an update
- Studies – archive / update
- List of what derived analytic measures we actually have for a dataset, how many people, has QC been done, where are the demographics, links to other covariates/info etc
## List from last time
Things to do in the docathon:
- MC: Drop or archive Flywheel stuff
- MC: Folks have trouble setting up a fresh project user, setting up miniconda environments.
- MC: Test out instructions in docs.
- AL: Datalad best practices.
- What does this entail?
- NP: Maybe https://handbook.datalad.org/en/latest/code_from_chapters/neurohackademy.html would help?
- TS: Is the Way still something we follow?
- MC: Kahini still follows this, but almost all of it can be dropped in favor of BABS and other new tools.
- JB: MSI documentation?
- TS: Can we put some documentation (e.g., MSI) behind a login?
- MC: No clue how to do that.
- JB: Maybe remove protected info (e.g., URLs)
- TaS: Making whole sites private seems to be possible: https://github.blog/changelog/2021-01-21-access-control-for-github-pages/
- Maybe we could have separate private docs.
- GS: Topics bleed over sections (e.g., info about PMACS appears in non-PMACS sections). Some info is redundant (which can lead to info drift).
- TaS: Instructions for publishing data to OpenNeuro.
- TaS: Project organization
- https://github.com/PennLINC/paper-template
- JB: Templates for submitting jobs
- Array jobs
- AL: Separate templates for Python, R, etc.
- PMACS vs. CUBIC (SLURM vs. LSF)
- MC: Do a burn-down first to remove lots of info.
- TS: Stay away from PMACS documentation mostly, since most folks don't use it.
- JB: Some bits are useful for CUBIC, so need to grab those pieces.
- TS: update static dataset lists with papers + website, etc.
- AL: Code review / how to do a good pull request
- Ariel Rokem has a good tutorial on this somewhere
- NP: Might be too simple, but this might help https://swcarpentry.github.io/git-novice/aio.html.
- JB: resources -- workshops (gams, dipy), conferences, fellowships to apply for
- AK: sage HR resource docs in admin
- TaS: Replace tutorials with running notebooks.
- May need to be offloaded to a JupyterBook or BookDown repo and linked in the docs.
- E.g., look at https://github.com/NBCLab/nimare-paper or https://github.com/ME-ICA/multi-echo-data-analysis. We can add scheduled GitHub Actions to run the build steps regularly.
- MC: Diagnosing failed jobs on CUBIC.
- E.g., killed job --> not enough memory
- TaS: Switch from Miniconda to Docker/Singularity for final papers?
- TaS: Sharing outputs from projects/papers/posters:
- Adding statistical maps from papers to NeuroVault.
- Adding figures and posters to FigShare.
- Anything else?
### Computation basics
- Move Python section but move it to "CUBIC"
- Keep everything
- Add another subsection "Visualization Basics" (i.e., what to install and what to use for)
### Documentation basics
- Taylor to take over documentation section and roll it into paper templates section
- People should review Chenying's docs (PRs) before deleting/adding information
- Kahini's https://github.com/PennLINC/PennLINC.github.io/pull/54
- https://github.com/PennLINC/PennLINC.github.io/pull/50
- https://github.com/PennLINC/PennLINC.github.io/pull/46
### Flywheel
- To be archived
### PMACS
- Rename to "HPC" - merge "PMACS" and "CUBIC" into the same section
- Erica uses it - so does David's lab.
- The first three points are useful, but we could move that to "Data Tasks"
- Archive the rest
### CUBIC
- Move to "HPC" and merge with PMACS section
- Audrey and Golia to focus on this section
- We should not recommend that people use X11 or GUI, or even RStudio. The section on R is fairly mutable and definitely not best practice. Singularity can be used to run RStudio. Other possibilities include working locally using an h5 file - this should be in the unarchived documentation: we can have tips on compressing and combining data for any graphical work which should not be ideally performed on CUBIC, but locally. We can move this to "Computation Basics/Visualization Basics" if we added a list of things to install and what they're used for re: visualization. We should move current documentation to archive and say it's not recommended but can be used in edge cases
- "Using R/R-studio and Installation of R packages" - written by Tinashe and it works, Audrey has tested it. We could have this as an alternative but not best practices
- Since there are so many options for performing computations on CUBIC and visualizing locally/on CUBIC if there's a lot of data using R/Python/notebooks and maybe Singularity, we should have information about use cases for each method. All these options should be in one place and should be oriented to in the same place - maybe just have an "HPC" section?
- We could have documentation on using Jupyter Lab while connecting to CUBIC
### Tutorials
### Meta-Analysis
- This is fine
### Current Projects
- Ted to nicely remind people to keep this updated
### Current Studies
- Fine for now
### Using containers
- This should be overhauled and tested later
- It should be moved into the "HPC" section
### Completed Projects and Studies
- We should put Current Projects need to Completed Projects and Studies
### The Way
- Archive
### Datalad + BABS section + Data Tasks
- Best practices for using these in conjunction
### Data Tasks
- Rename as needed
- Should be merged into above section
- PMACS --> CUBIC --> Unzip workflow
- Add docs on data sharing at some point, PR created by Kahini
### just-the-docs
- This should go