---
tags: Resources
---
# Action list & Progress
## Metabolomics data generation SOPs
- Example data - repository?
- [x] Scout for potential dataset -> CMSI QC (Rui's data as backup (see below))
- [ ] Check "QC" data (Anton)
- [ ] Curate dataset into small/manageable dataset suitable for training purposes (Anton)
- [ ] Make unified script for pre-processing of "QC" data (Anton)
- Overview / Pipeline / Workflow / Vocabulary (metabolite, annotation/identification etc)
- [ ] 1st version (Elise & Rui)
- [ ] Check (separate into the SOP tasks)
- SOP 1st version
- [x] xcms - PP (Elise)
- [x] xcms - RetAlign (Cecilia)
- [x] xcms - corr (Rui)
- [x] xcms - fillPeaks (Rui)
- [x] imputation (Rui)
- [x] RAMClust (Olle)
- [x] batchCorr (Anton)
- [x] MetID (Anton)
- [ ] Report template (Anton)
- [ ] Visualizations for sanity check
- Check 1
- [x] xcms - PP (Rui)
- [x] xcms - RetAlign (Elise --> Olle, double checked)
- [x] xcms - corr (Elise)
- [x] xcms - fillPeaks (Olle)
- [x] imputation (Olle)
- [x] RAMClust (Elise)
- [x] batchCorr (Elise)
- [ ] MetID ()
- [ ] Report template ()
- [ ] IPO vs XCMS3 translation table (Olle)
- [ ] Harmonization (Olle)
- Source code
- [ ] Source code 1st version (Sergiu)
- [ ] Source code checking (All) - Expand per SOP
- Resources
- [x] Video links (Calle)
- [x] R packages (Calle)
- [x] Bianca 1st version (Eddie / Cecilia)
- [x] Bianca checking (Rui)
- Git
- [ ] Presentation (Yan)
- [ ] Git-ing HackMD (Elise)
## Other efforts
- Data stewardship
- [ ] Original place for storage (chalmers offline; chalmers server; bianca)
- [ ] Data format for metadata SOP/info
- [ ] Codebook SOP/info
- [ ] Codebook example
- [ ] Managing Bianca projects SOP
- [ ] Appointing data steward
- QC package
- MSID package
## Harmonization
To improve throughput and reproducibility, we need to implement uniform naming strategies. Harmonization/standardization should be implemented for:
**Instrument file names in A_B_C_D_E_F format**
This naming strategy is important for CMSITools to properly extract all relevant information from the filename
- A is the date written as YYYY-MM-DD
- B is the Batch number and which week it was analyzed written as BZZWXX (e.g. B02W43).
- C is for chromatography, either RP (Reversed Phase) or HILIC.
- D is for polarity, either POS or NEG.
- E is sample identifier, marking the sample as either a sQC, ltQC, blank, cond (conditioning plasma) or a sample (samples named so that they can be backtraced to which sample it belongs to). Each element here needs to have a unique name for every batch (e.g. sQC01, or 125c).
- F is injection number, marking in what orders samples in each batch were injected.
**Features in ABc@d format**
- A is either H or R for chromatography (**H**ILIC / **R**eversed phase)
- B is either P or N for polarity (**P**ositive / **N**egative)
- c is the recorded m/z
- @ is a separator between m/z and rt
- d is the recorded retention time (rt)
NB! To facilitate tracking of features between different modules in the pipeline, *m/z and rt should be given with full resolution* - i.e. not truncated to a certain number of decimals.
**RAMClusters in ABCn@d format**
- A is either H or R for chromatography (**H**ILIC / **R**eversed phase)
- B is either P or N for polarity (**P**ositive / **N**egative)
- C for cluster
- n is the cluster number
- @ is a separator between C and rt
- d is the recorded retention time (rt)
**Other things**
@Hzwwav8nTEqDzIBprJEyaw
@YhTVx2AwRfS78SzlzmHT2Q
Please expand here as you see fit!
## Example data
We will use the CMSI "QC" data set for training and testing purposes
As a backup, Rui suggests this data: https://www.ebi.ac.uk/metabolights/MTBLS1839/assays
## Specific for overview
- [ ] Graphical overview of algorithms
- [ ] (Graphical) overview of file structure (SOPs, source code, example data, other resources)
- [ ] Pipeline
- [ ] Workflow
- [ ] Vocabulary
- [ ] Harmonization (see above)
- [ ] Align with report template
## General structure for SOPs
- Introduction: What is the purpose of this step in the pipeline?
- Input data description
Objects / variables / parameters / file formats
Where does it come from (e.g. "converted from instrument raw data using proteowizard" or "Output from SOP X: Algorithm")
- Describe key elements (central functions) of the code / scripts
But not the actual script in itself (reference the SRC with a link)
Describe key parameters and how they are obtained / optimized
- Output data description
## Source code
@tCElCkukTEqMlVK2VncDfQ Please write down what information is needed for each script and what are the example files
Example scripts, that are sort of ready for pipeline integration:
- BWAlignment.R
- PeakPicking_preBW.R
-
## Members
### Present
- Anton
- Elise
- Rui
- Olle
- Calle
- Yan (data stewardship)
### Past
- Eddie
- Cecilia
- Sergiu