---
title: 'Minnow Simulating Process'
disqus: hackmd
---
Minnow Simulating Process
===
#### The following code is under the assumption that the files you provided in the server are in the folder `data`, and the **working directory is set as the parent directory of that `data` folder**.
## Data preparation
1. Create related folders and process fasta file and tx2gene file **in your terminal**.
```bash
mkdir tominnow
mkdir simulation
cp data/annotation.expanded.fa tominnow
awk -F "\t" '{if ($1 ~ /-U/) {print $0"-U"} else {print $0}}' data/annotation.expanded.tx2gene.tsv > tominnow/minnow.tx2gene.tsv
```
2. **In R**, split the count matrices by samples and manipulate the files to make them follow minnow's menu.
```r=
sim_dir = "tominnow"
data_dir = "data"
gnames = read.csv(file.path(data_dir,"gene_ids.txt"), header=FALSE)$V1
bcs = read.csv(file.path(data_dir,"cells_ids.txt"), header=FALSE)$V1
s = read.csv(file.path(data_dir,"spliced.csv"), header=FALSE, col.names = bcs, row.names = gnames)
u = read.csv(file.path(data_dir,"unspliced.csv"), header=FALSE, col.names = bcs, row.names = gnames)
ss = list()
us = list()
for (samp in 1:4) {
pat = paste0("X", samp, ".")
out_dir = file.path(sim_dir, paste0('sample',samp))
dir.create(out_dir, recursive = TRUE, showWarnings = FALSE)
sample_index = startsWith(colnames(s), pat)
ss[[samp]] = s[,sample_index]
colnames(ss[[samp]]) = sub(pat,"",colnames(ss[[samp]]))
us[[samp]] = u[,sample_index]
colnames(us[[samp]]) = sub(pat,"",colnames(us[[samp]]))
write.table(rbind(ss[[samp]], us[[samp]]), file = file.path(out_dir, "quants_mat.csv"), sep = ",", quote = FALSE, row.names = FALSE, col.names = FALSE)
write.table(c(gnames, paste0(gnames,"-U")), file = file.path(out_dir, "quants_mat_rows.txt"), quote = FALSE, row.names = FALSE, col.names = FALSE)
write.table(colnames(ss[[samp]]), file = file.path(out_dir, "quants_mat_cols.txt"), quote = FALSE, row.names = FALSE, col.names = FALSE)
}
```
## Running Minnow
With the files in `tominnow` folder ready, we can now run minnow. The following codes are all run **in your terminal**.
1. install minnow
```bash
git clone --single-branch --branch minnow-velocity https://github.com/COMBINE-lab/minnow.git
cd minnow
mkdir build
cd build
cmake ..
make
cd ../..
```
2. build minnow index
```bash
minnow/build/src/minnow index -r tominnow/annotation.expanded.fa -k 101 -f 20 --tmpdir simulation/tmp -p 20 -o simulation/minnow_ind |& stdbuf -oL tr '\r' '\n' > simulation/minnow_index.log
```
3. Go minnow!
```bash
minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample1 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample1_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample1.log
minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample2 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample2_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample2.log
minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample3 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample3_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample3.log
minnow/build/src/minnow simulate --splatter-mode --g2t tominnow/minnow.tx2gene.tsv --inputdir tominnow/sample4 --PCR 6 -r simulation/minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample4_simulated_reads --dbg --gfa simulation/minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample4.log
```
If the repeated code bothers you, you can put them into a **bash script**.
```bash=
#!/bin/bash
samples=(1 2 3 4)
minnow="minnow/build/src/minnow"
minnow_ind="simulation/minnow_ind"
g2t="tominnow/minnow.tx2gene.tsv"
for sample in ${samples[@]}
do
cmd="$minnow simulate --splatter-mode --g2t $g2t --inputdir tominnow/sample$sample --PCR 6 -r $minnow_ind/ref_k101_fixed.fa -e 0.01 -p 20 -o simulation/sample${sample}_simulated_reads --dbg --gfa $minnow_ind/dbg.gfa -w minnow/data/737K-august-2016.txt --countProb minnow/data/hg/countProb_pbmc_4k.txt --custom |& stdbuf -oL tr '\r' '\n' > simulation/minnow_simulate_sample${sample}.log"
echo $cmd
eval $cmd
done
```
## Extra notification
As we found that if the provided files are mixed with Windows newline character(`0xD 0xA`) and Linux newline character (`0xA`), one of the external packages used by Minnow will stop working, we would suggest you to check the newline character used in the files if you are switching back and forth between Unix and Windows. To avoid this pitfall at all, you can run the following command **in terminal** before you run `minnow index` and `minnow simulate`:
```bash
sed -i -e 's/\r//g' tominnow/annotation.expanded.fa
sed -i -e 's/\r//g' tominnow/minnow.tx2gene.tsv
```