# Mouse ont assembly and pangenome
Assembly one sample (BXD24).
|Reads| Yield_[bp]| N50| Coverage| Max_length |Mean_length| Median_length |Mean_q |Median_q|
|----|----|---|---|---|---|---|--|--|
| 4,032,662 | 81,375,782,055 | 36,289 | 30 | 669,638 | 20,179 | 13,362 | 13.34 | 14.40
### 1. Generate the assembly
- I use wtdbg2 to perform an assembly of long-read nanopore sequencing data (-x ont) from a fastq file obtained by the base calling.
- I use wtpoa-cns to polish the assembly generated in step 1.
- I use minimap2 to align the original long-read nanopore sequencing data to the polished assembly from step 2. The command use the option specify that we are mapping long reads (-ax map-ont) and that the reference genome was assembled using a long-read assembly algorithm (-r2k). The output is a BAM file, in which I filter the aligned reads to remove secondary and supplementary alignments (specified by the "-F0x900" option).
```
sbatch -p workers -w octopus02 -c 48 --wrap 'cd /scratch && /lizardfs/flaviav/mouse_ont/assembly/as.sh'
Submitted batch job 124177
```
### 1b. Statistics on the assembly (quast)
quast.py mouse_reads.ont.wtdbg2.asm1.cns.fa.gz

### 1c. Assembly with canu
```
sbatch -p workers -w octopus02 -c 48 --wrap 'cd /scratch && /lizardfs/flaviav/mouse_ont/canu.sh'
submitted batch job 125166
```
### 2. Polishing the assembly with linked reads
```
bwa-0.7.17/bwa index mouse_reads.ont.wtdbg2.asm1.cns.fa
```
```
sbatch -p workers -w octopus05 -c 48 --wrap 'cd /scratch && /lizardfs/flaviav/mouse_ont/assembly/polish.sh'
Submitted batch job 125066
```
- Alignment: BWA mem to align linked reads to the ONT-based assembly of the mouse genome.
- Sorting: The resulting SAM file from the alignment step is then converted to a binary BAM file and sorted using Sambamba.
- Variant calling: This command uses Freebayes to call variants on each contig of the ONT-based assembly of the mouse genome.
- Concatenation: This command concatenates the variant calls from each contig into a single BCF file using BCFTOOLS. The -n and -f flags are used to specify the list of BCF files to concatenate and the reference FASTA file to use during the normalization process, respectively.
- Polishing: This command uses the polished BCF file generated in step 4 to polish the ONT-based assembly of the mouse genome. The -i, -H, and -f flags are used to specify the filter expression for variant calling.
Error in the sorting step, sambamba--> Pjotr is checking and I'll use GATK instead of sambamba for now