[log] Example run (pbrun & container)
===
###### tags: `Parabricks-v3.5`
###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `二級分析`
<br>
[TOC]
<br>
<hr>
<br>
## [官方測試集](https://docs.nvidia.com/clara/parabricks/v3.5/text/getting_started.html#step-4-example-run)
### 下載
```bash=
wget -O parabricks_sample.tar.gz \
"https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz"
```
### 解壓縮
```bash=
tar -xvzf parabricks_sample.tar.gz
```
<br>
<hr>
<br>
## 執行 `pbrun fq2bam`
> https://docs.nvidia.com/clara/parabricks/v3.5/text/fastq_and_bam_processing.html
[](https://i.imgur.com/cJH6AD6.png)
簡易版([指令來源:STEP 4: Example run](https://docs.nvidia.com/clara/parabricks/v3.5/text/getting_started.html#step-4-example-run))
```bash=
pbrun fq2bam \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz \
parabricks_sample/Data/sample_2.fq.gz \
--out-bam output.bam \
--x3
```
- ### 參數說明
- ```--x3```: debug 用
- 不帶 `--knownSites`, `--out-recal-file` 參數
### 對應的 docker-run 指令
> (從 log 中取得)
```bash=
docker run \
--gpus all \
-u=1000:1000 \
--rm \
-w=/mnt/parabricks \
--net=host \
-v /opt/parabricks:/INSTALL/ \
-v /mnt/parabricks/FJZPBJPY:/mnt/parabricks/FJZPBJPY \
-v /mnt/parabricks:/mnt/parabricks \
-v /mnt/parabricks/parabricks_sample/Ref:/mnt/parabricks/parabricks_sample/Ref \
-v /mnt/parabricks/parabricks_sample/Data:/mnt/parabricks/parabricks_sample/Data parabricks/release:v3.5.0 fq2bam \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
--out-bam output.bam \
--x3 \
--tmp-dir /mnt/parabricks/FJZPBJPY
```
- `--tmp-dir` (必要參數,否則會有 error)
```
Traceback (most recent call last):
File "/parabricks/run_pipeline.py", line 7, in <module>
sys.exit(PB.pb_main())
File "PB.pyx", line 1411, in PB.pb_main
File "PB.pyx", line 300, in PB.runfq2bam
File "PB.pyx", line 304, in PB.runfq2bam
ValueError: '--tmp-dir' is not in list
```
- 執行時間:52s (跟透過 pbrun 差不多)
- ### 執行中的警告
:::warning
:warning: **WARNING**
The system has 12 threads, however recommended number of threads with 2 GPU is 24.
The run might not finish or might have less than expected performance.
:::
<br>
### 查看執行結果(執行時間:5m 56s)
:::spoiler 詳細資訊 (NC12s-v2 log)
```
$ pbrun fq2bam \
> --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
> --in-fq parabricks_sample/Data/sample_1.fq.gz \
> parabricks_sample/Data/sample_2.fq.gz \
> --out-bam output.bam \
> --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
docker run --gpus all -u=1000:1000 --rm -w=/mnt/parabricks --net=host -v /opt/parabricks:/INSTALL/ -v /mnt/parabricks/FJZPBJPY:/mnt/parabricks/FJZPBJPY -v /mnt/parabricks:/mnt/parabricks -v /mnt/parabricks/parabricks_sample/Ref:/mnt/parabricks/parabricks_sample/Ref -v /mnt/parabricks/parabricks_sample/Data:/mnt/parabricks/parabricks_sample/Data parabricks/release:v3.5.0 fq2bam --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam --x3 --tmp-dir /mnt/parabricks/FJZPBJPY
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /mnt/parabricks/parabricks_sample/Data/sample_1.fq.gz and
/mnt/parabricks/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
g 2 b 0 B 2 P 4 s 1 r 0 o 2 m 1 z 4 f 2 v 0 M 2 name /mnt/parabricks/output.bam
/usr/local/cuda/.pb/binaries//bin/bwa mem /mnt/parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /mnt/parabricks/parabricks_sample/Data/sample_1.fq.gz /mnt/parabricks/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 -Z ./pbOpts.txt
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| GPU-BWA mem, Sorting Phase-I ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
WARNING
The system has 12 threads, however recommended number of threads with 2 GPU is 24.
The run might not finish or might have less than expected performance.
GPU-BWA mem
ProgressMeter Reads Base Pairs Aligned
[07:58:42] 5043564 580000000
[07:59:08] 10087128 1170000000
[07:59:34] 15130692 1750000000
[08:00:00] 20174256 2330000000
[08:00:26] 25217820 2890000000
[08:00:52] 30261384 3480000000
[08:01:18] 35304948 4060000000
[08:01:44] 40348512 4630000000
[08:02:10] 45392076 5220000000
[08:02:36] 50435640 5800000000
GPU-BWA Mem time: 286.371637 seconds
GPU-BWA Mem is finished.
GPU Sorting, Marking Dups, BQSR
ProgressMeter SAM Entries Completed
Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 286.372717 seconds
[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /mnt/parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /mnt/parabricks/parabricks_sample/Data/sample_1.fq.gz /mnt/parabricks/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 290.629 sec; CPU: 3379.259 sec
------------------------------------------------------------------------------
|| Program: GPU-BWA mem, Sorting Phase-I ||
|| Version: v3.5.0 ||
|| Start Time: Tue Jun 8 07:58:02 2021 ||
|| End Time: Tue Jun 8 08:02:57 2021 ||
|| Total Time: 4 minutes 55 seconds ||
------------------------------------------------------------------------------
/usr/local/cuda/.pb/binaries//bin/sort -sort_unmapped -ft 10 -gb 110
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| Sorting Phase-II ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
progressMeter - Percentage
[08:02:58] 0.0 0.00 GB
Sorting and Marking: 10.000 seconds
------------------------------------------------------------------------------
|| Program: Sorting Phase-II ||
|| Version: v3.5.0 ||
|| Start Time: Tue Jun 8 08:02:58 2021 ||
|| End Time: Tue Jun 8 08:03:08 2021 ||
|| Total Time: 10 seconds ||
------------------------------------------------------------------------------
/usr/local/cuda/.pb/binaries//bin/postsort /mnt/parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta -o /mnt/parabricks/output.bam -sort_unmapped -ft 4 -wt 2 -zt 3 -bq 2 -gb 110
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| Marking Duplicates, BQSR ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
progressMeter - Percentage
[08:03:19] 20.2 15.08 GB
[08:03:29] 43.4 10.73 GB
[08:03:39] 64.4 6.36 GB
[08:03:49] 82.8 2.06 GB
[08:03:59] 100.0 0.00 GB
BQSR and writing final BAM: 50.035 seconds
------------------------------------------------------------------------------
|| Program: Marking Duplicates, BQSR ||
|| Version: v3.5.0 ||
|| Start Time: Tue Jun 8 08:03:08 2021 ||
|| End Time: Tue Jun 8 08:03:59 2021 ||
|| Total Time: 51 seconds ||
------------------------------------------------------------------------------
```
:::
<br>
### 精簡過後,確認可以跑得 docker run 指令 (2021/06/10 18:22)
```bash=
# 切到含有 parabricks_sample 子目錄的目錄
rm -rf tmp-dir
mkdir tmp-dir
WORKSPACE=`pwd`
# 執行 fq2bam (約 52 秒)
docker run \
--gpus all \
-u=1000:1000 \
--rm \
--net=host \
-v /opt/parabricks:/INSTALL/ \
-v $WORKSPACE:$WORKSPACE \
-w=$WORKSPACE \
parabricks/release:v3.5.0 fq2bam \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz \
parabricks_sample/Data/sample_2.fq.gz \
--out-bam output.bam \
--tmp-dir tmp-dir \
--x3
```
<br>
<hr>
<br>
## 執行 `pbrun germline`
> https://docs.nvidia.com/clara/parabricks/v3.5/text/germline_pipeline.html
[](https://i.imgur.com/m39DxAz.png)
```bash=
pbrun germline \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
--knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--out-bam output.bam \
--out-variants output.vcf \
--out-recal-file report.txt \
--x3
```
- ### 參數說明
- ```--x3```: debug 用
<br>
### 對應的 docker-run 指令
> (從 log 中取得,共兩個指令)
:::warning
:warning: **注意**:共執行兩次不同的 docker-run 指令
:::
- ### 第一次:
```bash=
docker run \
--gpus all \
-u=1000:1000 \
--rm \
-w=/uploads/workspace \
--net=host \
-v /opt/parabricks:/INSTALL/ \
-v /uploads/workspace/WODDX80V:/uploads/workspace/WODDX80V \
-v /uploads/workspace:/uploads/workspace \
-v /uploads/workspace/parabricks_sample/Ref:/uploads/workspace/parabricks_sample/Ref \
-v /uploads/workspace/parabricks_sample/Data:/uploads/workspace/parabricks_sample/Data \
parabricks/release:v3.5.0 fq2bam \
--ref /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz \
/uploads/workspace/parabricks_sample/Data/sample_2.fq.gz \
@RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 \
--knownSites /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--out-bam /uploads/workspace/output.bam \
--out-recal-file /uploads/workspace/report.txt \
--memory-limit 110 \
--num-cpu-threads 0 \
--tmp-dir /uploads/workspace/WODDX80V \
--num-gpus 2 \
--x3
```
- **備註**:
`@RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1`
這一行是 pbrun 自動根據 input.fq 所產生
但不確定這行帶來的影響有多大?
可能影響到的是 vcf 的一些 header 說明?
- ### 第二次:
```bash=
docker run \
--gpus all \
-u=1000:1000 \
--rm \
-w=/uploads/workspace \
--net=host \
-v /opt/parabricks:/INSTALL/ \
-v /uploads/workspace/WODDX80V:/uploads/workspace/WODDX80V \
-v /uploads/workspace:/uploads/workspace \
-v /uploads/workspace/parabricks_sample/Ref:/uploads/workspace/parabricks_sample/Ref \
-v /uploads/workspace/parabricks_sample/Data:/uploads/workspace/parabricks_sample/Data \
parabricks/release:v3.5.0 haplotypecaller \
--ref /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-bam /uploads/workspace/output.bam \
--out-variants /uploads/workspace/output.vcf \
--ploidy 2 \
--num-htvc-threads 5 \
--in-recal-file /uploads/workspace/report.txt \
--tmp-dir /uploads/workspace/WODDX80V \
--num-gpus 2 \
--x3
```
<br>
### 查看執行結果(執行時間:11m22s = 6m3s + 5m19s)
:::spoiler 詳細資訊 (NC12s-v2 log)
```
$ pbrun germline \
> --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
> --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
> --knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
> --out-bam output.bam \
> --out-variants output.vcf \
> --out-recal-file report.txt \
> --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz and
/uploads/workspace/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
docker run --gpus all -u=1000:1000 --rm -w=/uploads/workspace --net=host -v /opt/parabricks:/INSTALL/ -v /uploads/workspace/WODDX80V:/uploads/workspace/WODDX80V -v /uploads/workspace:/uploads/workspace -v /uploads/workspace/parabricks_sample/Ref:/uploads/workspace/parabricks_sample/Ref -v /uploads/workspace/parabricks_sample/Data:/uploads/workspace/parabricks_sample/Data parabricks/release:v3.5.0 fq2bam --ref /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz /uploads/workspace/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 --knownSites /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz --out-bam /uploads/workspace/output.bam --out-recal-file /uploads/workspace/report.txt --memory-limit 110 --num-cpu-threads 0 --tmp-dir /uploads/workspace/WODDX80V --num-gpus 2 --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Read group created for /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz and
/uploads/workspace/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
g 2 b 0 B 2 P 4 s 1 r 0 o 2 m 1 z 4 f 2 v 0 M 2 name /uploads/workspace/output.bam report /uploads/workspace/report.txt K /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz
/usr/local/cuda/.pb/binaries//bin/bwa mem /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz /uploads/workspace/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 -Z ./pbOpts.txt
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| GPU-BWA mem, Sorting Phase-I ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
GPU-BWA mem
ProgressMeter Reads Base Pairs Aligned
WARNING
The system has 12 threads, however recommended number of threads with 2 GPU is 24.
The run might not finish or might have less than expected performance.
[09:00:49] 5043564 590000000
[09:01:15] 10087128 1160000000
[09:01:41] 15130692 1730000000
[09:02:07] 20174256 2310000000
[09:02:33] 25217820 2900000000
[09:02:59] 30261384 3490000000
[09:03:25] 35304948 4060000000
[09:03:51] 40348512 4640000000
[09:04:18] 45392076 5220000000
[09:04:44] 50435640 5800000000
GPU-BWA Mem time: 287.420745 seconds
GPU-BWA Mem is finished.
GPU Sorting, Marking Dups, BQSR
ProgressMeter SAM Entries Completed
Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 287.421802 seconds
[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /uploads/workspace/parabricks_sample/Data/sample_1.fq.gz /uploads/workspace/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 291.557 sec; CPU: 3389.752 sec
------------------------------------------------------------------------------
|| Program: GPU-BWA mem, Sorting Phase-I ||
|| Version: v3.5.0 ||
|| Start Time: Thu Jun 10 09:00:09 2021 ||
|| End Time: Thu Jun 10 09:05:05 2021 ||
|| Total Time: 4 minutes 56 seconds ||
------------------------------------------------------------------------------
/usr/local/cuda/.pb/binaries//bin/sort -sort_unmapped -ft 10 -gb 110
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| Sorting Phase-II ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
progressMeter - Percentage
[09:05:06] 0.0 0.00 GB
Sorting and Marking: 10.000 seconds
------------------------------------------------------------------------------
|| Program: Sorting Phase-II ||
|| Version: v3.5.0 ||
|| Start Time: Thu Jun 10 09:05:06 2021 ||
|| End Time: Thu Jun 10 09:05:16 2021 ||
|| Total Time: 10 seconds ||
------------------------------------------------------------------------------
/usr/local/cuda/.pb/binaries//bin/postsort /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta -o /uploads/workspace/output.bam -sort_unmapped -ft 4 -wt 2 -zt 3 -bq 2 -gb 110 -a /uploads/workspace/report.txt /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| Marking Duplicates, BQSR ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
progressMeter - Percentage
[09:05:27] 0.0 19.33 GB
[09:05:37] 0.3 19.23 GB
[09:05:47] 43.7 10.67 GB
[09:05:57] 79.4 3.01 GB
[09:06:07] 100.0 0.00 GB
BQSR and writing final BAM: 55.401 seconds
------------------------------------------------------------------------------
|| Program: Marking Duplicates, BQSR ||
|| Version: v3.5.0 ||
|| Start Time: Thu Jun 10 09:05:16 2021 ||
|| End Time: Thu Jun 10 09:06:13 2021 ||
|| Total Time: 57 seconds ||
------------------------------------------------------------------------------
docker run --gpus all -u=1000:1000 --rm -w=/uploads/workspace --net=host -v /opt/parabricks:/INSTALL/ -v /uploads/workspace/WODDX80V:/uploads/workspace/WODDX80V -v /uploads/workspace:/uploads/workspace -v /uploads/workspace/parabricks_sample/Ref:/uploads/workspace/parabricks_sample/Ref -v /uploads/workspace/parabricks_sample/Data:/uploads/workspace/parabricks_sample/Data parabricks/release:v3.5.0 haplotypecaller --ref /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-bam /uploads/workspace/output.bam --out-variants /uploads/workspace/output.vcf --ploidy 2 --num-htvc-threads 5 --in-recal-file /uploads/workspace/report.txt --tmp-dir /uploads/workspace/WODDX80V --num-gpus 2 --x3
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
/usr/local/cuda/.pb/binaries//bin/htvc /uploads/workspace/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /uploads/workspace/output.bam 2 -o /uploads/workspace/output.vcf -nt 5 -a /uploads/workspace/report.txt
------------------------------------------------------------------------------
|| Parabricks accelerated Genomics Pipeline ||
|| Version v3.5.0 ||
|| GPU-GATK4 HaplotypeCaller ||
|| Contact: Parabricks-Support@nvidia.com ||
------------------------------------------------------------------------------
ProgressMeter - Current-Locus Elapsed-Minutes Regions-Processed Regions/Minute
0 /uploads/workspace/output.bam /uploads/workspace/output.vcf
[09:06:45] chr1:69736213 0.2 295788 1774728
[09:06:55] chr1:172127728 0.3 638210 1914630
[09:07:05] chr2:24575905 0.5 1059308 2118616
[09:07:15] chr2:118607996 0.7 1438509 2157763
[09:07:25] chr2:210110165 0.8 1822251 2186701
[09:07:35] chr3:53063751 1.0 2183068 2183068
[09:07:45] chr3:143860641 1.2 2555289 2190247
[09:07:55] chr4:62855995 1.3 3034038 2275528
[09:08:05] chr4:171853801 1.5 3487445 2324963
[09:08:15] chr5:84331180 1.7 3892371 2335422
[09:08:25] chr5:173260116 1.8 4264213 2325934
[09:08:35] chr6:74481799 2.0 4586215 2293107
[09:08:45] chr7:11246134 2.2 5032603 2322739
[09:08:55] chr7:130997412 2.3 5521721 2366451
[09:09:05] chr8:61243172 2.5 5878954 2351581
[09:09:15] chr9:20644571 2.7 6315026 2368134
[09:09:25] chr10:3110359 2.8 6733447 2376510
[09:09:35] chr10:117102296 3.0 7207833 2402611
[09:09:45] chr11:73843030 3.2 7574909 2392076
[09:09:55] chr12:26831401 3.3 7945179 2383553
[09:10:05] chr13:28406382 3.5 8431610 2409031
[09:10:15] chr14:34871914 3.7 8861037 2416646
[09:10:25] chr15:57763138 3.8 9295695 2424963
[09:10:35] chr16:64607746 4.0 9700290 2425072
[09:10:45] chr17:68337341 4.2 10069197 2416607
[09:10:55] chr18:68500686 4.3 10399080 2399787
[09:11:05] chr20:38097406 4.5 10836101 2408022
[09:11:15] chr22:44141670 4.7 11223280 2404988
[09:11:25] chr17_GL000258v2_alt:1521348 4.8 11959301 2474338
Total time taken: 304.593
------------------------------------------------------------------------------
|| Program: GPU-GATK4 HaplotypeCaller ||
|| Version: v3.5.0 ||
|| Start Time: Thu Jun 10 09:06:17 2021 ||
|| End Time: Thu Jun 10 09:11:36 2021 ||
|| Total Time: 5 minutes 19 seconds ||
------------------------------------------------------------------------------
```
:::
<br>
### VM 之間的效能比較
| Program | NC12s-v2 | NC24s-v3 | AWS(T4x8) |
| -------- | -------- |-------- | --------- |
| GPU-BWA mem, Sorting Phase-I | 04m 56s | 02m 57s | 02m 22s |
| Sorting Phase-II | 00m 10s | 00m 10s | 00m 10s |
| Marking Duplicates, BQSR | 00m 57s | 01m 51s | 01m 31s |
| GPU-GATK4 HaplotypeCaller | 05m 19s | 02m 44s | 02m 28s |
| Total | 11m 22s | 07m 42s | 06m 31s |
- ### AWS(T4x8) (2021/07/14)
- **GPU**
Tesla T4 (15109MiB) x 8張
- **CPU**
1(插槽) x 77(核/插槽) x 1(超執行緒/核) = 77條超執行緒 (硬體執行緒)
( Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz )
- **RAM**
320GB
<br>
### 精簡過後,確認可以跑得 docker run 指令 (2021/06/10 19:55)
```bash=
# 切到含有 parabricks_sample 子目錄的目錄
rm -rf tmp-dir
mkdir tmp-dir
WORKSPACE=`pwd`
# 執行 fq2bam (約 6m 3s)
docker run \
--gpus all \
-u=1000:1000 \
--rm \
--net=host \
-v /opt/parabricks:/INSTALL/ \
-v $WORKSPACE:$WORKSPACE \
-w=$WORKSPACE \
parabricks/release:v3.5.0 fq2bam \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq parabricks_sample/Data/sample_1.fq.gz \
parabricks_sample/Data/sample_2.fq.gz \
"@RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1" \
--knownSites parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--out-bam output.bam \
--out-recal-file report.txt \
--memory-limit 110 \
--num-cpu-threads 0 \
--num-gpus 2 \
--tmp-dir tmp-dir \
--x3
# 相對於 pbrun fq2bam,添加的參數:
# --in-fq "@RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1"
# --memory-limit 110 (文件沒提到)
# --num-cpu-threads 0 (文件沒提到)
# --num-gpus 2 (文件沒有提到預設值)
# 執行 haplotypecaller (約 4m 58s)
docker run \
--gpus all \
-u=1000:1000 \
--rm \
--net=host \
-v /opt/parabricks:/INSTALL/ \
-v $WORKSPACE:$WORKSPACE \
-w=$WORKSPACE \
parabricks/release:v3.5.0 haplotypecaller \
--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-bam output.bam \
--in-recal-file report.txt \
--out-variants output.vcf \
--ploidy 2 \
--num-htvc-threads 5 \
--num-gpus 2 \
--tmp-dir tmp-dir \
--x3
# 相對於 pbrun haplotypecaller,添加的參數:
# --ploidy 2
# --num-htvc-threads 5 (文件沒提到)
# --num-gpus 2 (文件沒有提到預設值)
# --tmp-dir tmp-dir
```