二級分析 / BWA
===
###### tags: `基因體/二級分析`
###### tags: `生物資訊`, `基因體`, `二級分析`, `BWA`, `BWA-MEM`, `Burrows-Wheeler Aligner`, `GATK`, `Nvidia Clara Parabricks`, `序列比對`
<br>
[TOC]
<br>
<hr>
## [Burrows-Wheeler Aligner](http://bio-bwa.sourceforge.net/)
> BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms:
> - BWA-backtrack
> - BWA-SW
> - BWA-MEM.
>
> BWA 是一個軟體套件,用於將「低差異序列」映射到「大型參考基因體(例如人類基因組)」上,它由三個演算法組成:
> - BWA-backtrack (BWA 回朔)
> - BWA-SW
> - BWA-MEM.
- [Manual Reference Pages - bwa (1)](http://bio-bwa.sourceforge.net/bwa.shtml)
- SYNOPSIS
```bash
bwa index ref.fa
bwa mem ref.fa reads.fq > aln-se.sam
bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
```
<br>
## 指令
- ### [[Nvidia][Parbricks] COMPATIBLE CPU BASED BWA-MEM, GATK4 COMMANDS](https://docs.nvidia.com/clara/parabricks/v3.5/text/germline_pipeline.html#compatible-cpu-based-bwa-mem-gatk4-commands)
```bash=
# Run bwa-mem and pipe output to create sorted bam
$ bwa mem -t 32 -K 10000000 \
-R '@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1' \
Ref/Homo_sapiens_assembly38.fasta S1_1.fastq.gz S1_2.fastq.gz
```
- Parabricks 內建 bwa
```
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
Contact: Heng Li <lh3@sanger.ac.uk>
```
- `-t INT`
number of threads [1]
- `-K INT`
process INT input bases in each batch regardless of nThreads (for reproducibility) []
- `-R STR`
read group header line such as `@RG\tID:foo\tSM:bar` [null]
- ### [Parallelizing Bwa On Multiple Cpus](https://www.biostars.org/p/16191/)
Q: Has anyone successfully parallelized BWA alignment on multiple CPUs?
A: Yes, you can split the reads into multiple fastq files, align, and then merge results. The reads are aligned independently of each other
<br>
## 演算法
- [[HackMD] 基因序列比對演算法](https://hackmd.io/@UlvydjoQQKafa4iwBkJ9lg/BkGumVZtu)
- [基因序列比對的演算法動態規劃](http://ezphysics.nchu.edu.tw/15_course/Bioinformation/RF05.pdf)
- [生物資訊:基因序列比對的演算法–動態規劃](https://scitechvista.nat.gov.tw/Article/c000003/detail?ID=9ecf6988-1c7a-4517-acca-477b97e119ad)
- [序列比對分析 Sequence Alignment](http://bioinfo.nhri.org.tw/emboss/jemboss/alignment.htm#pair)
<br>
## Q&A
- [Alignment and mapping](https://www.biostars.org/p/180986/)
- [【1.0】alignment和mapping区别](https://qinqianshan.com/bioinformatics/mapping/alignment-mapping/)
- map a read
- 它来自哪里?
- 不关心read与其来源之间的确切对齐
- align a read
- 不仅要求它在基因组中可能出现的位置
- 而且还想知道它和那些碱基匹配上。
- 个人解读
- alignment 更关注过程,更关注细节,具体是那些碱基匹配上
- mapping更关注结果,是否匹配上
- [變異位點偵測 left alignment @ 有勁的基因資訊](https://yourgene.pixnet.net/blog/post/119576346)

#比對 #靠左對齊 left alignment, #靠右對齊 right alignment
- > 已完成基因圖譜(mapping)的SAM或BAM格式檔案,我們可以直接進行left alignment處理。這方面,基因組分析工具包GATK( Genome Analysis Toolkit)有提供一個名為「LeftAlignAndTrimVariants」的模板供大家使用。這個模版會直接幫使用者去除相同的鹼基並且做向左對齊。
- 所以是:先 map, 後 sort, 最後 align
- [Mapping/Alignment - 國家高速網路與計算中心](http://humem.nchc.org.tw/NGS/webpages/Alignment_mapping.html)
- [以 BWA 執行定序資料比對人類基因體參考序列 (ALPS主機)](http://humem.nchc.org.tw/NGS/webpages/Benchmark/BWA_benchmark.html)
- [RNA-Sick@Day13 > 萬物皆虛,萬事皆允|不用 alignment 推估表現量 feat. kallisto](https://ithelp.ithome.com.tw/articles/10222244)
- POA (Partial Order Alignment)
```
>seq1
CCGCTTTTCCGC
>seq2
CCGCAAAACCGC
```

[](https://i.imgur.com/vOAtg2J.png)