# Dataflow diagram
1. **Rat proteomics**
1. [Rodent stroke proteomics from paper supplement](https://pubs.acs.org/doi/suppl/10.1021/acs.jproteome.9b00220/suppl_file/pr9b00220_si_002.xlsx).
`data/rat.proteomics/ratbr-ais.xlsx`
2. Table for converting UNIPROT accession to gene names is downloaded from UNIPROT interface
`annotations/uniprot-rat-entries-to-gns.txt`
4. The quantities normalized to sham
`data/rat.proteomics/rat-proteomics-norm-by-sham.tsv` is created from 1.2., 1.1 by `notebooks/rat-AIS-proteomics.ipynb`
5. WGCNA `meig` (module eigengene) and `g2m` (gene-to-module) files
`data/rat.proteomics/wgcna-pr-rats-meigs.csv`, `data/rat.proteomics/wgcna-pr-rats-g2m.csv` - these files are generated from 1.4 by `annotations/wgcna-proteomics-rats.ipynb`
6. GOEA analysis produces xlsx reports from gene lists of every WGCNA module (`data/rat.proteomics/wgcna-pr-rats-g2m.csv`)
form 1.5 reports are produced by `notebooks/wgcna-proteomics-rats.ipynb`
7. *.. find better representation ..*
2. **Mouse proteomics**
1. Mouse stroke proteomics from paper [(Gu, 2021)](https://10.1021/acs.jproteome.1c00259) deposit.
`data/mouse.proteomics/protein-?.tsv`
`data/mouse.proteomics/proteins_combined.tsv`
2. The quantities normalized to sham
2.1 are processed into `data/mouse.proteomics/mice-Gu-28days-normalized.tsv` by the notebook `notebooks/mouse-proteomics-preprocess.ipynb`