Parabricks Errors
===
###### tags: `Parabricks-v3.8`
###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `Benchmark`
<br>
[TOC]
<br>
## @NCHI-production
### 2022/05/25 - 執行 WGS 的 germline 發生錯誤
- 狀況
- 啟用 notebook 服務
- 前面執行數次的 germline 都正常 (連續執行)
- 約使用 8 小時
- 錯誤原因
> [Parabricks Options Error]: Could not find accessible GPUs. Please check the output of nvidia-smi -L
:::spoiler
```
Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation
[Parabricks Options Error]: Could not find accessible GPUs. Please check the output of nvidia-smi -L
[Parabricks Options Error]: Run with -h to see help
Traceback (most recent call last):
File "/usr/local/parabricks/pbutils.py", line 89, in GetNumGPUs
output = subprocess.check_output(["nvidia-smi", "-L"], universal_newlines=True)
File "/root/miniconda3/envs/parabricks/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/root/miniconda3/envs/parabricks/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['nvidia-smi', '-L']' returned non-zero exit status 255.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/parabricks/pbrun", line 10, in <module>
runArgs = pbargs.getArgs()
File "/usr/local/parabricks/pbargs.py", line 2907, in getArgs
return PBRun(sys.argv)
File "/usr/local/parabricks/pbargs.py", line 1065, in __init__
self.runArgs = getattr(self, args.command)(argList)
File "/usr/local/parabricks/pbargs.py", line 2157, in germline
self.addToParser(germline_parser_sysgroup, sysOptionGenerator().allOptions)
File "/usr/local/parabricks/pbargs.py", line 114, in __init__
PBOption(category="sysOption", name="--num-gpus", default=GetNumGPUs(), typeName=int, helpStr="Number of GPUs to use for a run."),
File "/usr/local/parabricks/pbutils.py", line 92, in GetNumGPUs
OptError("Could not find accessible GPUs. Please check the output of nvidia-smi -L")
File "/usr/local/parabricks/pbutils.py", line 61, in OptError
deleteTmpDir()
File "/usr/local/parabricks/pbutils.py", line 23, in deleteTmpDir
if os.path.exists(runTempDir):
File "/root/miniconda3/envs/parabricks/lib/python3.7/genericpath.py", line 19, in exists
os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
```

```
(parabricks) root@tj-parabricks-6cbd58cdf5-lfcgw:/workspace# nvidia-smi
Failed to initialize NVML: Unknown Error
(parabricks) root@tj-parabricks-6cbd58cdf5-lfcgw:/workspace# nvidia-smi -L
Failed to initialize NVML: Unknown Error
```

:::
<br>
<hr>
<br>
## Issues
- ### [[Kubeviert] 建立Parabricks on k8s上的vm](http://10.78.26.44:30000/UXQ/ai-tainan-gov/-/issues/372#note_53346)
- ### [容器取得可用記憶體時,拿到系統的數字,導致後續配置設定有誤](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/230)
- ### [[容器服務] 不名原因 gpu 被卸載,拿不到 GPU](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/234)
- http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/223
[K8s] 更新nvidia-device-plugin
- http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/182
[Container Service] 選用含GPU Flavor 的notebook,使用超過5天後GPU會失效
- ### [[Notebook] 新增 Parabricks 映像檔](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/268)
- ### [[Nvidia][Forums] Could not run fq2bam as part of germline pipeline](https://forums.developer.nvidia.com/t/could-not-run-fq2bam-as-part-of-germline-pipeline/205484)
- ### [[Nvidia][Forums] Planning to set up the Parabricks environment](https://forums.developer.nvidia.com/t/planning-to-set-up-the-parabricks-environment/210543)
- ### [[Nvidia][Forums] The repository ‘https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release’ is not signed](https://forums.developer.nvidia.com/t/the-repository-https-developer-download-nvidia-com-compute-cuda-repos-ubuntu1804-x86-64-release-is-not-signed/193764/8?u=tj_tsai)
For now you can remove the appropriate .list file in `/etc/apt/sources.list.d/`.
```
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
```
```
$ cat /etc/apt/sources.list.d/cuda.list
deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /
$ cat /etc/apt/sources.list.d/nvidia-ml.list
deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /
```
- https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64
- [[Nvidia][Forums] [Parabricks3.7][A100] cudaSafeCall() failed at ParaBricks/src/mem_chain_kernel.cu/136: invalid device symbol ](https://forums.developer.nvidia.com/t/213345)