Somatic Pipeline === ###### tags: `Parabricks-v3.8` ###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `二級分析`, `somatic`, `Pipeline` <br> [TOC] <br> ## doc ### v3.5 - ### [SOMATIC PIPELINE](https://docs.nvidia.com/clara/parabricks/v3.5/text/somatic_pipeline.html) - 需準備 - fasta - 正常樣本的 fastq - 腫瘤/癌症樣本的 fastq ### v3.7 - ### [pbrun somatic](https://docs.nvidia.com/clara/parabricks/3.7.0/Documentation/ToolDocs/man_somatic.html#man-somatic) - ### [Whole-Genome Somatic Small Variant Calling](https://docs.nvidia.com/clara/parabricks/3.7.0/How-Tos/SomaticCalling.html#) - 需準備 - fasta - 正常樣本的 fastq - 腫瘤/癌症樣本的 fastq <br> <hr> <br> ## Tools ### NCBI SRA > 請參考:[[HackMD] NCBI SRA](/XHcj7Iy_Rr2JouDENfga1g) <br> <hr> <br> ## 測試 ### script ```bash= %%time !pbrun somatic \ --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \ --knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \ --in-tumor-fq \ datasets/somatic/SRR7890824_1.fastq.gz \ datasets/somatic/SRR7890824_2.fastq.gz \ "@RG\tID:sm_tumor_rg1\tLB:lib1\tPL:bar\tSM:sm_tumor\tPU:sm_tumor_rg1" \ --out-vcf output/output.vcf \ --out-tumor-bam output/tumor.bam \ --out-tumor-recal-file output/tumor-recal.txt \ --in-normal-fq \ datasets/somatic/SRR7890827_1.fastq.gz \ datasets/somatic/SRR7890827_2.fastq.gz \ "@RG\tID:sm_normal_rg1\tLB:lib1\tPL:bar\tSM:sm_normal\tPU:sm_normal_rg1" \ --out-normal-bam output/normal.bam \ --out-normal-recal-file output/normal-recal.txt ``` ### Q & A - @RG: [Read Groups](https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups) > [[HackMD] VCF-Body](https://hackmd.io/6rATKTvURVSKia8K_9kBeQ#VCF-Body) - `ID` = Read group identifier - `PU` = Platform Unit - `SM` = Sample - `PL` = Platform/technology used to produce the read - `LB` = DNA preparation library identifier <br> <hr> <br> ## 3 個Errors - ### an illegal memory access was encountered > 僅遇過一次 > 不確定是否存取超過容器上限資源(128GB 應該不可能)? ``` [PB Warning 2022-Apr-15 17:47:50][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/mem_chain2aln_kernel.cu/4993: an illegal memory access was encountered [PB Warning 2022-Apr-15 17:47:50][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/mem_chain2aln_kernel.cu/4993: an illegal memory access was encountered [PB Error 2022-Apr-15 17:47:50][ParaBricks/src/check_error.cu:44] No GPUs active, shutting down due to previous error., exiting. For technical support visit https://docs.nvidia.com/clara/parabricks/3.7.0/index.html#how-to-get-help Exiting... ``` [![](https://i.imgur.com/x6mtnrN.png)](https://i.imgur.com/x6mtnrN.png) - ### paired reads have different names: "SRR7890824.1.2", "SRR7890824.1.1" > - `accession.spot.readid` > - [Nvidia Parabricks forum](https://forums.developer.nvidia.com/t/troubleshooting-download-example-fastq-files/211668) ``` [PB Error 2022-Apr-13 11:18:37][ParaBricks/src/CReadWrite.cpp:379] paired reads have different names: "SRR7890824.1.2", "SRR7890824.1.1" in file /workspace/datasets/somatic/SRR7890824_1.fastq.gz and /workspace/datasets/somatic/SRR7890824_2.fastq.gz , exiting. ``` ![](https://i.imgur.com/WPAF5rz.png) - ### With `--knownSites` option, recalibration file will be generated. Please specify output > - [Nvidia Parabricks forum](https://forums.developer.nvidia.com/t/troubleshooting-download-example-fastq-files/211670) ``` [Parabricks Options Mesg]: Checking argument compatibility [Parabricks Options Error]: With --knownSites option, recalibration file will be generated. Please specify output recalibration file. [Parabricks Options Error]: Run with -h to see help Could not run fq2bam as part of normal sample processing for somatic pipeline Exiting pbrun ... ``` [![](https://i.imgur.com/MfSHLNR.png)](https://i.imgur.com/MfSHLNR.png) 修正狀況: [![](https://i.imgur.com/H85raxP.png)](https://i.imgur.com/H85raxP.png) ```python %%time !date +"%D-%T" !pbrun somatic \ --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \ --knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \ --in-tumor-fq \ datasets/tmp/SRR7890824_1.fastq.gz \ datasets/tmp/SRR7890824_2.fastq.gz \ "@RG\tID:sm_tumor_rg1\tLB:lib1\tPL:bar\tSM:sm_tumor\tPU:sm_tumor_rg1" \ --out-vcf output/output.vcf \ --out-tumor-bam output/tumor.bam \ --out-tumor-recal-file output/tumor-recal.txt \ --in-normal-fq \ datasets/tmp/SRR7890827_1.fastq.gz \ datasets/tmp/SRR7890827_2.fastq.gz \ "@RG\tID:sm_normal_rg1\tLB:lib1\tPL:bar\tSM:sm_normal\tPU:sm_normal_rg1" \ --out-normal-bam output/normal.bam \ --out-normal-recal-file output/normal-recal.txt !date +"%D-%T" ``` <br> <hr> <br> ## benchmark [![](https://i.imgur.com/VVBhYLr.png)](https://i.imgur.com/VVBhYLr.png) ([圖檔來源](https://docs.nvidia.com/clara/parabricks/v3.5/text/somatic_pipeline.html)) <br> ### 硬體規格:c8m128gt2 | Programs | Round-1 | Round-2 | Round-3 | | -------------------------------------| ---------:| ---------:| ---------:| | [tumor] GPU-BWA mem, Sorting Phase-I | 9h 18m 24s | 9h 25m 37s | 9h 32m 32s | | [tumor] Sorting Phase-II | 15m 42s | 12m 32s | 16m 52s | | [tumor] Marking Duplicates, BQSR | 34m 37s | 35m 20s | 33m 19s | | [normal] GPU-BWA mem, Sorting Phase-I | 9h 28m 38s | 9h 28m 33s |9h 23m 33s | | [normal] Sorting Phase-II | 13m 31s | 12m 51s | 16m 21s | | [normal] Marking Duplicates, BQSR | 35m 23s | 34m 17s | 35m 30s | | GPU-GATK4 mutect | 5h 19m 02s | 5h 23m 41s | 5h 19m 36s | | **Total** | 25h 45m 17s | 26h 52m 51s | 25h 57m 43s | | **Actual** | **25h 49m 02s** | **25h 56m 14s** | **26h 01m 34s** | - `output.vcf` 大小約為 74.9 MiB - round3: `2022/04/20 03:25`