OneAI
===
###### tags: `Parabricks-v3.8`
###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `二級分析`
<br>
[TOC]
<br>
## 網路傳輸狀況
### 10.78.26.241 (ESC4000)
- 無法連到 ctnservice.oneai.twcc.ai
```
$ ssh -p 30148 root@ctnservice.oneai.twcc.ai
```
### 10.78.154.119 (桌機)
[](https://i.imgur.com/lJiQs6m.png)
- ### D15780_S13_L001
- `D15780_S13_L001_R1.fastq.gz`
60.86M 100% 11.51MB/s 0:00:05 (xfr#1, to-chk=1/2)
(md5: `26d4d67f0eabf912128cdea3a5761eea`)
- `D15780_S13_L001_R2.fastq.gz`
66.89M 100% 10.96MB/s 0:00:05 (xfr#2, to-chk=0/2)
(md5: `b200c31a05040bad0588379f585c6fcc`)
- ### WES-EDA-013A
- `WES-EDA-013A_R1.fastq.gz`
2.66G 100% 11.00MB/s 0:03:50 (xfr#1, to-chk=1/2)
- `WES-EDA-013A_R2.fastq.gz`
2.88G 100% 9.57MB/s 0:04:47 (xfr#2, to-chk=0/2)
- ### WGS-LIS-AI018A
```
$ rsync -e 'ssh -p 30148' \
--progress --partial --append -zh \
WGS-LIS-AI018A_R* \
root@ctnservice.oneai.twcc.ai:/workspace/datasets
```
- `WGS-LIS-AI018A_R1.fastq.gz`
35.31G 100% 10.56MB/s 0:53:08 (xfr#1, to-chk=1/2)
- `WGS-LIS-AI018A_R2.fastq.gz`
36.82G 100% 10.17MB/s 0:57:32 (xfr#2, to-chk=0/2)
<br>
<hr>
<br>
## 儲存 I/O 效能
| ROund | READ IOPS | READ BW=MiB/s | WRITE IOPS | WRITE BW=MiB/s | date |
|-----:|-----|-----|-----|-----|-----|
| 1 | 11.3k | 44.1 | 3772 | 14.7 |
| 2 | 7821 | 30.6 | 2611 | 10.2 |
| 3 | 3784 | 14.8 | 1263 | 4.94 |
| 4 | 6061 | 23.7 | 2024 | 7.91 |
| 5 | 2733 | 10.7 | 912 | 3.57 | 2022/03/21 11:35AM |
|||
| **平均** | 6339.8 | 24.8 | 2116.4 | 8.3 |
| **標準差** | 3406.5 | 13.3 | 1137.2 | 4.4 |
<br>
<hr>
<br>
## 硬體規格
### oai.comp.c8m128gt2
> GPU: T4x2, CPU: 8, RAN: 128G
- ### GPU
```
!nvidia-smi
Fri Mar 4 09:08:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 31C P8 15W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA Tesla T4 Off | 00000000:00:07.0 Off | 0 |
| N/A 31C P8 14W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
```
- CPU (實際測得 64 vCPU)
```
$ cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l
2
$ cat /proc/cpuinfo | grep "processor" | wc -l
64
$ cat /proc/cpuinfo | grep "cores" | uniq
cpu cores : 32
```
```
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 1
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel Xeon Processor (Skylake, IBRS)
Stepping: 4
CPU MHz: 2500.000
BogoMIPS: 5000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni md_clear
```
- 後期測得
| | CPU MHz | BogoMIPS |
| ----- | ------- | -------- |
| 早期 | 2500 | 5000 |
| 容器 A | 2300 | 4600 |
| 容器 B | 2300 | 4600 |
- RAM (實際測得 720G ?) => 此為 host 大小
```
cat /proc/meminfo
MemTotal: 742678696 kB
MemFree: 43068168 kB
MemAvailable: 678995656 kB
Buffers: 131528 kB
Cached: 591782720 kB
SwapCached: 0 kB
Active: 142303872 kB
Inactive: 478568928 kB
Active(anon): 27467248 kB
Inactive(anon): 8427240 kB
Active(file): 114836624 kB
Inactive(file): 470141688 kB
Unevictable: 18584 kB
Mlocked: 18584 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 1544 kB
Writeback: 0 kB
AnonPages: 28975724 kB
Mapped: 10100004 kB
Shmem: 8481696 kB
KReclaimable: 57027136 kB
Slab: 70229024 kB
SReclaimable: 57027136 kB
SUnreclaim: 13201888 kB
KernelStack: 64288 kB
PageTables: 156644 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 371339348 kB
Committed_AS: 258035340 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 283032 kB
VmallocChunk: 0 kB
Percpu: 3644928 kB
HardwareCorrupted: 0 kB
AnonHugePages: 106496 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 23209824 kB
DirectMap2M: 416143360 kB
DirectMap1G: 317718528 kB
```
```
$ lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-2
0x0000000100000000-0x000000b43fffffff 717G online yes 4-720
Memory block size: 1G
Total online memory: 720G
Total offline memory: 0B
```
- Storage
```
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75
fiotest: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.1
Starting 1 process
fiotest: Laying out IO file (1 file / 8192MiB)
fio: native_fallocate call failed: Operation not supported
Jobs: 1 (f=1): [m(1)][100.0%][r=36.6MiB/s,w=12.5MiB/s][r=9367,w=3196 IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=1): err= 0: pid=241403: Fri Mar 4 16:20:55 2022
read: IOPS=9494, BW=37.1MiB/s (38.9MB/s)(6141MiB/165581msec)
bw ( KiB/s): min=13421, max=45880, per=99.97%, avg=37965.58, stdev=3089.20, samples=331
iops : min= 3355, max=11470, avg=9491.34, stdev=772.29, samples=331
write: IOPS=3170, BW=12.4MiB/s (12.0MB/s)(2051MiB/165581msec)
bw ( KiB/s): min= 4207, max=15808, per=99.97%, avg=12677.71, stdev=1018.78, samples=331
iops : min= 1051, max= 3952, avg=3169.38, stdev=254.71, samples=331
cpu : usr=2.54%, sys=16.98%, ctx=1304064, majf=0, minf=7
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwt: total=1572145,525007,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=37.1MiB/s (38.9MB/s), 37.1MiB/s-37.1MiB/s (38.9MB/s-38.9MB/s), io=6141MiB (6440MB), run=165581-165581msec
WRITE: bw=12.4MiB/s (12.0MB/s), 12.4MiB/s-12.4MiB/s (12.0MB/s-12.0MB/s), io=2051MiB (2150MB), run=165581-165581msec
```
<br>
<hr>
<br>
## 執行狀況評估
> - **[GT](https://forums.guru3d.com/threads/what-does-gt-gts-gtx-mean.249891/)**:
> GT stands for gran turismo, which is *spanish* (I think) for 'big car'... GT is often appended to the name of production cars to indicate a sports model, i.e. GT means fast.
- pbrun germlin
- **Q1**: 跑在容器,詢問到的記憶體大小為 host 本身,
若扣除掉 kernel size 以及相關執行程式,
parabricks 能用的資源大小到底要配多少???
- **Q2**: 若資源太少,到底能不能跑?
最低門檻大概多少?
### c8 m128 gt2
> - **Signal 11**:
> - SIGSEGV (Linux Segmentation violation)
> (Segmentation Fault, 記憶體區段錯誤)
> 資料量大於記憶體,造成不合法地存取記憶體 (推測)
> - log: `Received signal: 11`
> - **Signal 9**:
> - exit code: 128 + 9 = 137
> - log: `Killed`
> - oom-killer (Out of Memory)
| 記憶體限制 | 執行總時間 | <b style="color: red;">失敗原因</b> |
| -------- | --------: | ------- |
| `--memory-limit 1` | 40m 24s | GPU-BWA mem, Sorting Phase-I<br><b style="color: red;">Error</b>: Received signal: 11 |
| `--memory-limit 2` | 3h 30m 34s | GPU-BWA mem, Sorting Phase-I<br><b style="color: red;">Error</b>: Received signal: 11 |
| `--memory-limit 3` | 5h 37m 48s | |
| `--memory-limit 4` | 3h 32m 21s | GPU-BWA mem, Sorting Phase-I<br><b style="color: red;">Error</b>: Received signal: 11 |
| `--memory-limit 4` | 5h 42m 58s | PASS
| `--memory-limit 5` | 5h 39m 04s | |
| `--memory-limit 10` | 5h 26m 47s | |
| `--memory-limit 20` | 5h 26m 53s | |
| `--memory-limit 30` | 5h 27m 07s | |
| `--memory-limit 40` | 5h 27m 22s | |
| `--memory-limit 50` | 5h 27m 48s | |
| `--memory-limit 70` | 5h 27m 48s | |
| `--memory-limit 88` | 5h 24m 55s | |
| `--memory-limit 118` | 5h 25m 51s | |
| `--memory-limit 120` | 5h 41m 40s | |
| `--memory-limit 122` | 5h 40m 11s | |
| `--memory-limit 123` | 3h 50m 09s | Killed @ Marking Duplicates 65.4% |
| `--memory-limit 127` | 3h 46m 44s | Killed @ Marking Duplicates 39.9% |
| `--memory-limit 128` | 3h 46m 53s | Killed @ Marking Duplicates 41.5% |
| `--memory-limit 178` | 3h 45m 41s | Killed @ Marking Duplicates 39.3% |
| `--memory-limit 228` | 3h 46m 20s | Killed @ Marking Duplicates 37.7% |
| `--memory-limit 278` | 3h 46m 31s | Killed @ Marking Duplicates 42.1% |
| `--memory-limit 328` | 3h 46m 27s | Killed @ Marking Duplicates 39.7% |
### c8 m64 gt1
| 記憶體限制 | 執行總時間 | <b style="color: red;">失敗原因</b> |
| -------- | --------: | ------- |
| `--memory-limit 20` | 7h 44m 33s | |
| `--memory-limit 30` | 7h 45m 57s | |
| `--memory-limit 40` | 7h 45m 57s | |
| `--memory-limit 50` | 5h 28m 43s | GPU-BWA mem, Sorting Phase-I<br><b style="color: red;">Error</b>: Received signal: 11<br>exit code: 255 |
| `--memory-limit 55` | - | Killed @ Marking Duplicates 80.6%<br>exit code: 255<br>**備註:**<br> - GPU-BWA mem, Sorting Phase-I: 5h 29m 14s<br> - Sorting Phase-II: 8m 50s |
| `--memory-limit 55` | 5h 59m 38s | Killed @ Marking Duplicates 72.9%<br>**備註:**<br> - GPU-BWA mem, Sorting Phase-I: 5h 29m 34s<br> - Sorting Phase-II: 9m 01s |
| `--memory-limit 60` | 5h 47m 51s | Killed @ Marking Duplicates 23.2%<br>**備註:**<br> - GPU-BWA mem, Sorting Phase-I: 5h 27m 49s<br> - Sorting Phase-II: 10m 11s |
| `--memory-limit 62` | 5h 40m 57s | Killed @ Marking Duplicates 0.1%<br>**備註:**<br> - GPU-BWA mem, Sorting Phase-I: 5h 26m 48s<br> - Sorting Phase-II: 10m 41s |
### c4 m64 gt1
<!-- http://ctnservice.oneai.twcc.ai:31835/lab? -->
| 記憶體限制 | 執行總時間 | <b style="color: red;">失敗原因</b> |
| -------- | --------: | ------- |
| `--memory-limit 3` | 14h 14m 08s | |
| `--memory-limit 4` | 14h 12m 37s | |
| `--memory-limit 5` | 14h 12m 06s | |
| `--memory-limit 10` | 14h 14m 24s | |
| `--memory-limit 20` | 14h 12m 30s | |
| `--memory-limit 30` | 14h 12m 22s | |
| `--memory-limit 40` | 14h 18m 44s | |
| `--memory-limit 50` | 14h 26m 40s | |
| `--memory-limit 60` | 8h 35m 39s | GPU-BWA mem, Sorting Phase-I<br><b style="color: red;">Error</b>: Received signal: 11 |
| `--memory-limit 60` | 9h 09m 27s | GPU-BWA mem, Sorting Phase-I<br><b style="color: red;">Error</b>: Received signal: 11<br>**備註:**<br> - GPU-BWA mem, Sorting Phase-I: 8h 46m 17s<br> - Sorting Phase-II: 22m 2s<br>no CUDA-capable device |
| `--memory-limit 62` | 9h 16m 17s | Killed @ Marking Duplicates 0%<br>**備註:**<br> - GPU-BWA mem, Sorting Phase-I: 8h 48m 17s<br> - Sorting Phase-II: 22m 31s |
### CPU 與執行時間
| 硬體規格 | 記憶體限制 | 執行總時間 |
| ------- | --------: | ------- |
| c4 m64 gt1 | `--memory-limit 40` | 14h 18m 44s |
| c8 m64 gt1 | `--memory-limit 40` | 7h 45m 57s |
| c8 m128 gt2 | `--memory-limit 40` | 5h 27m 22s | |
<br>
### c8m128gt2 參數比較
> T4 x 2, 8 vCPU, 128GB
- params
- **ng**: `--num-gpus`
- **ml**: `--memory-limit`
- **nct**: `--num-cpu-threads`
> :warning: **`--num-cpu-threads`**
> 在 AWS 上做實驗,時間差異性很小,看起來沒有作用,不確定實際用途
>
> :bulb: 應該不是用於限制 CPU 數量,而是在 1vCPU 中配置的執行緒數量
> - [看這張圖,注意 `-nt`](https://i.imgur.com/iBoPk1W.png)
> - 一般 1vCPU 頂多配置 2 thread,
> - 配置太多會有 context switch 花費,可能沒有效益,甚至還拖慢速度
- P: Program
- **P1**: GPU-BWA mem, Sorting Phase-I
- **P2**: Sorting Phase-II
- **P3**: Marking Duplicates, BQSR
- **P4**: GPU-GATK4 HaplotypeCaller
| **#** | ng | ml | nct | P1 | P2 | P3 | P4 | Total|
|---|----|----|-----|---:|---:|---:|---:|-----:|
| **1** | - | - | - | 3h28m26s | 8m41s | 14m07s | 1h32m34s | 5h25m16s |
| **2** | - | - | - | 3h27m28s | 8m40s | 13m56s | 1h32m33s | 5h24m01s |
| **3** | - | - | - | 3h28m35s | 8m31s | <span style="color: red;">killed</span> | | |
| **4** | - | 88 | - | 3h25m24s | 7m51s | 15m03s | 1h35m19s | 5h24m55s |
| **A.1** | 1 | 62 | 2 | 5h54m59s | 15m32s | 49m24s | 2h35m51s | 9h38m17s |
| **A.2** | 1 | 62 | 4 | 6h02m42s| 13m11s | 45m05s | 2h36m45s | 9h39m37s |
| **B.1** | 1 | 62 | 8 | 7h10m40s | 13m02s | 44m24s | 2h35m32s | 10h46m04s |
| **B.2** | 1 | 62 | - | 5h44m34s | 15m52s | 43m44s | 2h34m22s | 9h20m50s |
| **A.3** | 2 | 62 | 2 | 4h28m44s | 15m42s | 34m16s | 2h49m18s | 8h09m56s |
| **A.4** | 2 | 62 | 4 | 5h22m50s| 15m13s | 31m06s | 2h53m50s | 9h04m44s |
| **B.3** | 2 | 62 | 8 | 6h31m01s | 17m02s | 32m46s | 2h50m08s | 10h12m56s |
| **B.5** | 1 | 50 | 4 | 7h03m15s | 15m11s | 45m05s | 2h39m49s | 10h45m06s |
| **A.5** | 1 | 50 | 8 | 9h25m13s | 13m32s | 40m13s | 2h41m41s | 13h02m29s |
| **B.4** | 1 | 50 | 8 | 7h52m48s | 16m02s | 38m03s | 2h40m56s | 11h29m48s |
| **A.6** | 2 | - | - | 4h06m40s | 14m21s | 32m57s | 2h44m20s | 7h40m28s|
| **A.7** | 1 | - | - | 6h06m48s | 16m12s | <span style="color: red;">killed</span> | | |
| **A.8** | - | - | - | 4h31m38s| 15m12s | 32m06s | 2h44m10s | 8h05m10s |
| **B.6** | 2 | - | - | 4h12m00s | 15m41s | 28m08s | 2h29m24s | 7h26m45s |
| **B.7** | 1 | - | - | 6h00m59s | 15m33s | <span style="color: red;">killed</span> | | |
| **B.8** | - | - | - | 4h42m33s | 12m41s | 26m34s| 2h17m01s | 7h40m17s |
A: http://ctnservice.oneai.twcc.ai:30993/lab/tree/c8m128gt2-memory_limit-num_cpu_thread.ipynb
B: http://ctnservice.oneai.twcc.ai:32280/lab/tree/c8m128gt2-memory_limit-num_cpu_thread.ipynb
<br>
## 總結
- ### 要在 OneAI 容器中執行 parabricks,需要留意下面幾點:
1. **需要透過參數 `--memory-limit` 來設限記憶體使用量**
因為 parabricks 向系統詢問到的「總記憶體」和「可用記憶體」為 host 本身
不表示容器可以使用到那麼多資源
且容器本身有硬體資源設限,超過使用量會因 OOM 而收到 killed 訊號
2. **儲存體的 I/O 效能,變動範圍大**
底下是在 5 個不同的時間點所測得的效能
| ROund | READ IOPS | READ BW=MiB/s | WRITE IOPS | WRITE BW=MiB/s |
|-----:|-----|-----|-----|-----|
| 1 | 11.3k | 44.1 | 3772 | 14.7 |
| 2 | 7821 | 30.6 | 2611 | 10.2 |
| 3 | 3784 | 14.8 | 1263 | 4.94 |
| 4 | 6061 | 23.7 | 2024 | 7.91 |
| 5 | 2733 | 10.7 | 912 | 3.57 |
|||
| **平均** | 6339.8 | 24.8 | 2116.4 | 8.3 |
| **標準差** | 3406.5 | 13.3 | 1137.2 | 4.4 |
- ### 要在 OneAI 容器中執行 pbrun germline,需要留意下面幾點:
1. **參數 `--memory-limit` 最大能給予多少,目前是憑經驗:**
| 硬體設定 | 記憶體限制| `--memory-limit` |
| ------- | ------------- | ---------------- |
| c4 m64 gt1 | 64GB | `--memory-limit 50` |
| c8 m64 gt1 | 64GB | `--memory-limit 40` |
| c8 m128 gt2 | 128GB | `--memory-limit 122` |
- 對有 128GB 記憶體的容器來說,
不使用參數 `--memory-limit` 情況下,大部分可以跑成功 (5/8),但有時失敗 (3/8)
- `--memory-limit` 最小的配額要 3GB 以上,可能跟使用的資料大小有關
- `--memory-limit` 設限 10GB, 20GB, 30GB, ...,對執行總時間影響不大 (差異不大)
2. **不同的硬體 flavor,目前測試出來的最好&最差結果:**
| 硬體設定 | ng | ml | Total | P1 | P2 | P3 | P4 |
|---------|----|----|-----:|---:|---:|---:|---:|
| c4m64gt1 | - | 5 | **14h12m06s** | 8h34m57s | 21m01s | 47m33s | 4h27m13s |
| c4m64gt1 | - | 50 | **14h26m40s** | 8h38m37s | 22m51s | 49m42s | 4h33m36s |
| c8m64gt1 | - | 20 | **7h44m33s** | 5h28m48s | 9m10s | 21m52s | 1h43m13s |
| c8m64gt1 | - | 40 | **7h45m57s** | 5h29m49s | 9m10s | 22m01s | 1h43m24s |
| c8m128gt2 | - | - | **5h24m01s** | 3h27m28s | 8m40s | 13m56s | 1h32m33s |
| c8m128gt2<sup><b style="color: red;">*</b></sup> | 2 | - | **7h40m28s** | 4h06m40s | 14m21s | 32m57s | 2h44m20s |
| c8m128gt2<sup><b style="color: red;">*</b></sup> | - | - | **8h05m10s** | 4h31m38s | 15m12s | 32m06s | 2h44m10s |
- params
- **ng**: `--num-gpus`
- **ml**: `--memory-limit`
- P: Program
- **P1**: GPU-BWA mem, Sorting Phase-I
- **P2**: Sorting Phase-II
- **P3**: Marking Dmuplicates, BQSR
- **P4**: GPU-GATK4 HaplotypeCaller
- <b style="color: red;">*</b> : 此時 host 使用資源較接近上限,執行速度普遍慢了 2h ~ 2.5h
3. **與 AWS: `g4dn.2xlarge` (T4x1, 8vCPU, 32GB) 做比較**
> 選定與 OneAI 相似的運算環境:`c8m64gt1` (T4x1, 8vCPU, 64GB) 做比較
| 硬體設定 | ml | Total | P1 | P2 | P3 | P4 |
|---------|----|-----:|---:|---:|---:|---:|
| **OneAI: c8m64gt1** | 20 | **7h44m33s** | 5h28m48s | 9m10s| 21m52s | 1h43m13s |
| **AWS: g4dn.2xlarge** | - | **7h58m13s** | 5h28m34s | 6m50s | 52m41s | 1h30m13s |
| **AWS: g4dn.2xlarge** | 16 | **8h01m50s** | 5h34m11s | 6m41s | 52m02s | 1h28m51s |
- params
- **ml**: `--memory-limit`
4. **NV 建議整個 host 給 parabricks 用**
> [](https://i.imgur.com/aXq0iDU.png)
> [
Could not run fq2bam as part of germline pipeline](https://forums.developer.nvidia.com/t/could-not-run-fq2bam-as-part-of-germline-pipeline/205484)