Minnow Simulating Process

--- title: 'Minnow Simulating Process' disqus: hackmd --- Minnow Simulating Process === #### The following code is under the assumption that the files you provided in the server are in the folder `data`, and the **working directory is set as the parent directory of that `data` folder**. ## Data preparation 1. Create related folders and process fasta file and tx2gene file **in your terminal**. ```bash mkdir tominnow mkdir simulation cp data/annotation.expanded.fa tominnow awk -F "\t" '{if ($1 ~ /-U/) {print $0"-U"} else {print $0}}' data/annotation.expanded.tx2gene.tsv > tominnow/minnow.tx2gene.tsv ``` 2. **In R**, split the count matrices by samples and manipulate the files to make them follow minnow's menu. ```r= sim_dir = "tominnow" data_dir = "data" gnames = read.csv(file.path(data_dir,"gene_ids.txt"), header=FALSE)$V1 bcs = read.csv(file.path(data_dir,"cells_ids.txt"), header=FALSE)$V1 s = read.csv(file.path(data_dir,"spliced.csv"), header=FALSE, col.names = bcs, row.names = gnames) u = read.csv(file.path(data_dir,"unspliced.csv"), header=FALSE, col.names = bcs, row.names = gnames) ss = list() us = list() for (samp in 1:4) { pat = paste0("X", samp, ".") out_dir = file.path(sim_dir, paste0('sample',samp)) dir.create(out_dir, recursive = TRUE, showWarnings = FALSE) sample_index = startsWith(colnames(s), pat) ss[[samp]] = s[,sample_index] colnames(ss[[samp]]) = sub(pat,"",colnames(ss[[samp]])) us[[samp]] = u[,sample_index] colnames(us[[samp]]) = sub(pat,"",colnames(us[[samp]])) write.table(rbind(ss[[samp]], us[[samp]]), file = file.path(out_dir, "quants_mat.csv"), sep = ",", quote = FALSE, row.names = FALSE, col.names = FALSE) write.table(c(gnames, paste0(gnames,"-U")), file = file.path(out_dir, "quants_mat_rows.txt"), quote = FALSE, row.names = FALSE, col.names = FALSE) write.table(colnames(ss[[samp]]), file = file.path(out_dir, "quants_mat_cols.txt"), quote = FALSE, row.names = FALSE, col.names = FALSE) } ``` ## Running Minnow With the files in `tominnow` folder ready, we can now run minnow. The following codes are all run **in your terminal**. 1. install minnow ```bash git clone --single-branch --branch minnow-velocity https://github.com/COMBINE-lab/minnow.git cd minnow mkdir build cd build cmake .. make cd ../.. ``` 2. build minnow index ```bash minnow/build/src/minnow index -r tominnow/annotation.expanded.fa -k 101 -f 20 --tmpdir simulation/tmp -p 20 -o simulation/minnow_ind |& stdbuf -oL tr '\r' '\n' > simulation/minnow_index.log ``` 3. Go minnow! ```bash minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample1 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample1_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample1.log minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample2 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample2_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample2.log minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample3 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample3_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample3.log minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample4 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample4_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample4.log ``` If the repeated code bothers you, you can put them into a **bash script**. ```bash= #!/bin/bash samples=(1 2 3 4) minnow="minnow/build/src/minnow" minnow_ind="simulation/minnow_ind" g2t="tominnow/minnow.tx2gene.tsv" for sample in ${samples[@]} do cmd="$minnow simulate --splatter-mode --g2t $g2t --inputdir tominnow/sample$sample --PCR 6 -r $minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample${sample}_simulated_reads --dbg --gfa $minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample${sample}.log" echo $cmd eval $cmd done ``` ## Extra notification As we found that if the provided files are mixed with Windows newline character(`0xD 0xA`) and Linux newline character (`0xA`), one of the external packages used by Minnow will stop working, we would suggest you to check the newline character used in the files if you are switching back and forth between Unix and Windows. To avoid this pitfall at all, you can run the following command **in terminal** before you run `minnow index` and `minnow simulate`: ```bash sed -i -e 's/\r//g' tominnow/annotation.expanded.fa sed -i -e 's/\r//g' tominnow/minnow.tx2gene.tsv ```