Somatic Pipeline
===
###### tags: `Parabricks-v3.8`
###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `二級分析`, `somatic`, `Pipeline`
<br>
[TOC]
<br>
## doc
### v3.5
- ### [SOMATIC PIPELINE](https://docs.nvidia.com/clara/parabricks/v3.5/text/somatic_pipeline.html)
- 需準備
- fasta
- 正常樣本的 fastq
- 腫瘤/癌症樣本的 fastq
### v3.7
- ### [pbrun somatic](https://docs.nvidia.com/clara/parabricks/3.7.0/Documentation/ToolDocs/man_somatic.html#man-somatic)
- ### [Whole-Genome Somatic Small Variant Calling](https://docs.nvidia.com/clara/parabricks/3.7.0/How-Tos/SomaticCalling.html#)
- 需準備
- fasta
- 正常樣本的 fastq
- 腫瘤/癌症樣本的 fastq
<br>
<hr>
<br>
## Tools
### NCBI SRA
> 請參考:[[HackMD] NCBI SRA](/XHcj7Iy_Rr2JouDENfga1g)
<br>
<hr>
<br>
## 測試
### script
```bash=
%%time
!pbrun somatic \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--in-tumor-fq \
datasets/somatic/SRR7890824_1.fastq.gz \
datasets/somatic/SRR7890824_2.fastq.gz \
"@RG\tID:sm_tumor_rg1\tLB:lib1\tPL:bar\tSM:sm_tumor\tPU:sm_tumor_rg1" \
--out-vcf output/output.vcf \
--out-tumor-bam output/tumor.bam \
--out-tumor-recal-file output/tumor-recal.txt \
--in-normal-fq \
datasets/somatic/SRR7890827_1.fastq.gz \
datasets/somatic/SRR7890827_2.fastq.gz \
"@RG\tID:sm_normal_rg1\tLB:lib1\tPL:bar\tSM:sm_normal\tPU:sm_normal_rg1" \
--out-normal-bam output/normal.bam \
--out-normal-recal-file output/normal-recal.txt
```
### Q & A
- @RG: [Read Groups](https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups)
> [[HackMD] VCF-Body](https://hackmd.io/6rATKTvURVSKia8K_9kBeQ#VCF-Body)
- `ID` = Read group identifier
- `PU` = Platform Unit
- `SM` = Sample
- `PL` = Platform/technology used to produce the read
- `LB` = DNA preparation library identifier
<br>
<hr>
<br>
## 3 個Errors
- ### an illegal memory access was encountered
> 僅遇過一次
> 不確定是否存取超過容器上限資源(128GB 應該不可能)?
```
[PB Warning 2022-Apr-15 17:47:50][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/mem_chain2aln_kernel.cu/4993: an illegal memory access was encountered
[PB Warning 2022-Apr-15 17:47:50][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/mem_chain2aln_kernel.cu/4993: an illegal memory access was encountered
[PB Error 2022-Apr-15 17:47:50][ParaBricks/src/check_error.cu:44] No GPUs active, shutting down due to previous error., exiting.
For technical support visit https://docs.nvidia.com/clara/parabricks/3.7.0/index.html#how-to-get-help
Exiting...
```
[](https://i.imgur.com/x6mtnrN.png)
- ### paired reads have different names: "SRR7890824.1.2", "SRR7890824.1.1"
> - `accession.spot.readid`
> - [Nvidia Parabricks forum](https://forums.developer.nvidia.com/t/troubleshooting-download-example-fastq-files/211668)
```
[PB Error 2022-Apr-13 11:18:37][ParaBricks/src/CReadWrite.cpp:379] paired reads have different names: "SRR7890824.1.2", "SRR7890824.1.1" in file /workspace/datasets/somatic/SRR7890824_1.fastq.gz and /workspace/datasets/somatic/SRR7890824_2.fastq.gz
, exiting.
```

- ### With `--knownSites` option, recalibration file will be generated. Please specify output
> - [Nvidia Parabricks forum](https://forums.developer.nvidia.com/t/troubleshooting-download-example-fastq-files/211670)
```
[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Error]: With --knownSites option, recalibration file will be generated. Please specify output
recalibration file.
[Parabricks Options Error]: Run with -h to see help
Could not run fq2bam as part of normal sample processing for somatic pipeline
Exiting pbrun ...
```
[](https://i.imgur.com/MfSHLNR.png)
修正狀況:
[](https://i.imgur.com/H85raxP.png)
```python
%%time
!date +"%D-%T"
!pbrun somatic \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--in-tumor-fq \
datasets/tmp/SRR7890824_1.fastq.gz \
datasets/tmp/SRR7890824_2.fastq.gz \
"@RG\tID:sm_tumor_rg1\tLB:lib1\tPL:bar\tSM:sm_tumor\tPU:sm_tumor_rg1" \
--out-vcf output/output.vcf \
--out-tumor-bam output/tumor.bam \
--out-tumor-recal-file output/tumor-recal.txt \
--in-normal-fq \
datasets/tmp/SRR7890827_1.fastq.gz \
datasets/tmp/SRR7890827_2.fastq.gz \
"@RG\tID:sm_normal_rg1\tLB:lib1\tPL:bar\tSM:sm_normal\tPU:sm_normal_rg1" \
--out-normal-bam output/normal.bam \
--out-normal-recal-file output/normal-recal.txt
!date +"%D-%T"
```
<br>
<hr>
<br>
## benchmark
[](https://i.imgur.com/VVBhYLr.png)
([圖檔來源](https://docs.nvidia.com/clara/parabricks/v3.5/text/somatic_pipeline.html))
<br>
### 硬體規格:c8m128gt2
| Programs | Round-1 | Round-2 | Round-3 |
| -------------------------------------| ---------:| ---------:| ---------:|
| [tumor] GPU-BWA mem, Sorting Phase-I | 9h 18m 24s | 9h 25m 37s | 9h 32m 32s |
| [tumor] Sorting Phase-II | 15m 42s | 12m 32s | 16m 52s |
| [tumor] Marking Duplicates, BQSR | 34m 37s | 35m 20s | 33m 19s |
| [normal] GPU-BWA mem, Sorting Phase-I | 9h 28m 38s | 9h 28m 33s |9h 23m 33s |
| [normal] Sorting Phase-II | 13m 31s | 12m 51s | 16m 21s |
| [normal] Marking Duplicates, BQSR | 35m 23s | 34m 17s | 35m 30s |
| GPU-GATK4 mutect | 5h 19m 02s | 5h 23m 41s | 5h 19m 36s |
| **Total** | 25h 45m 17s | 26h 52m 51s | 25h 57m 43s |
| **Actual** | **25h 49m 02s** | **25h 56m 14s** | **26h 01m 34s** |
- `output.vcf` 大小約為 74.9 MiB
- round3: `2022/04/20 03:25`