Parabricks Errors === ###### tags: `Parabricks-v3.8` ###### tags: `基因體`, `NVIDIA`, `Clara`, `Parabricks`, `Benchmark` <br> [TOC] <br> ## @NCHI-production ### 2022/05/25 - 執行 WGS 的 germline 發生錯誤 - 狀況 - 啟用 notebook 服務 - 前面執行數次的 germline 都正常 (連續執行) - 約使用 8 小時 - 錯誤原因 > [Parabricks Options Error]: Could not find accessible GPUs. Please check the output of nvidia-smi -L :::spoiler ``` Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation [Parabricks Options Error]: Could not find accessible GPUs. Please check the output of nvidia-smi -L [Parabricks Options Error]: Run with -h to see help Traceback (most recent call last): File "/usr/local/parabricks/pbutils.py", line 89, in GetNumGPUs output = subprocess.check_output(["nvidia-smi", "-L"], universal_newlines=True) File "/root/miniconda3/envs/parabricks/lib/python3.7/subprocess.py", line 411, in check_output **kwargs).stdout File "/root/miniconda3/envs/parabricks/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['nvidia-smi', '-L']' returned non-zero exit status 255. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/parabricks/pbrun", line 10, in <module> runArgs = pbargs.getArgs() File "/usr/local/parabricks/pbargs.py", line 2907, in getArgs return PBRun(sys.argv) File "/usr/local/parabricks/pbargs.py", line 1065, in __init__ self.runArgs = getattr(self, args.command)(argList) File "/usr/local/parabricks/pbargs.py", line 2157, in germline self.addToParser(germline_parser_sysgroup, sysOptionGenerator().allOptions) File "/usr/local/parabricks/pbargs.py", line 114, in __init__ PBOption(category="sysOption", name="--num-gpus", default=GetNumGPUs(), typeName=int, helpStr="Number of GPUs to use for a run."), File "/usr/local/parabricks/pbutils.py", line 92, in GetNumGPUs OptError("Could not find accessible GPUs. Please check the output of nvidia-smi -L") File "/usr/local/parabricks/pbutils.py", line 61, in OptError deleteTmpDir() File "/usr/local/parabricks/pbutils.py", line 23, in deleteTmpDir if os.path.exists(runTempDir): File "/root/miniconda3/envs/parabricks/lib/python3.7/genericpath.py", line 19, in exists os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` ![](https://i.imgur.com/kv6NdIZ.png) ``` (parabricks) root@tj-parabricks-6cbd58cdf5-lfcgw:/workspace# nvidia-smi Failed to initialize NVML: Unknown Error (parabricks) root@tj-parabricks-6cbd58cdf5-lfcgw:/workspace# nvidia-smi -L Failed to initialize NVML: Unknown Error ``` ![](https://i.imgur.com/1iq7BIW.png) ::: <br> <hr> <br> ## Issues - ### [[Kubeviert] 建立Parabricks on k8s上的vm](http://10.78.26.44:30000/UXQ/ai-tainan-gov/-/issues/372#note_53346) - ### [容器取得可用記憶體時,拿到系統的數字,導致後續配置設定有誤](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/230) - ### [[容器服務] 不名原因 gpu 被卸載,拿不到 GPU](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/234) - http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/223 [K8s] 更新nvidia-device-plugin - http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/182 [Container Service] 選用含GPU Flavor 的notebook,使用超過5天後GPU會失效 - ### [[Notebook] 新增 Parabricks 映像檔](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/268) - ### [[Nvidia][Forums] Could not run fq2bam as part of germline pipeline](https://forums.developer.nvidia.com/t/could-not-run-fq2bam-as-part-of-germline-pipeline/205484) - ### [[Nvidia][Forums] Planning to set up the Parabricks environment](https://forums.developer.nvidia.com/t/planning-to-set-up-the-parabricks-environment/210543) - ### [[Nvidia][Forums] The repository ‘https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release’ is not signed](https://forums.developer.nvidia.com/t/the-repository-https-developer-download-nvidia-com-compute-cuda-repos-ubuntu1804-x86-64-release-is-not-signed/193764/8?u=tj_tsai) For now you can remove the appropriate .list file in `/etc/apt/sources.list.d/`. ``` W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed. ``` ``` $ cat /etc/apt/sources.list.d/cuda.list deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 / $ cat /etc/apt/sources.list.d/nvidia-ml.list deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 / ``` - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 - [[Nvidia][Forums] [Parabricks3.7][A100] cudaSafeCall() failed at ParaBricks/src/mem_chain_kernel.cu/136: invalid device symbol ](https://forums.developer.nvidia.com/t/213345)