win11 pro install wsl2 docker with cuda for GPU

# windows使用 WSL2 安裝 Docker 並驅動 CUDA 與 GPU # 原來只要如下方這樣做就可以了 win11 開啟 wsl2 功能安裝 docker desktop (基本上到這裡就完成) ``` PS C:\Users\rchiu> wsl --list Windows 子系統 Linux 版發佈: docker-desktop (預設) docker-desktop-data Ubuntu-20.04 ``` 只要有 docker-desktop (預設) 就可以正常使用 docker (+GPU) Ubuntu-20.04 基本上也可以不必裝 ``` PS C:\Users\rchiu> docker images REPOSITORY TAG IMAGE ID CREATED SIZE zi2zi_tensorflow latest 0b9538155fcc 3 hours ago 4.44GB pytorch/pytorch 2.1.1-cuda12.1-cudnn8-runtime 22150ae8096c 2 days ago 7.21GB nvcr.io/nvidia/k8s/cuda-sample nbody 06d607b1fa6f 14 months ago 321MB tensorflow/tensorflow 1.14.0-gpu-py3 a7a1861d2150 4 years ago 3.51GB PS C:\Users\rchiu> docker ps -al CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5f0a46bdfd50 0b9538 "/bin/bash" 3 hours ago Up 31 minutes zi2zi_tesorflow ``` 測試GPU ``` PS C:\Users\rchiu> docker exec -ti zi2zi_tesorflow /bin/bash ________ _______________ ___ __/__________________________________ ____/__ /________ __ __ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / / _ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ / /_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/ WARNING: You are running this container as root, which can cause new files in mounted volumes to be created as the root user on your host machine. To avoid this, run the container by specifying your user's userid: $ docker run -u $(id -u):$(id -g) args... root@5f0a46bdfd50:/# root@5f0a46bdfd50:/# python >>> import tensorflow as tf >>> tf.test.gpu_device_name() ...(略) 2023-11-18 23:20:29.976006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: NVIDIA GeForce RTX 4090 major: 8 minor: 9 memoryClockRate(GHz): 2.595 pciBusID: 0000:01:00.0 ...(略) ``` ``` >>> from tensorflow.python.client import device_lib >>> print(device_lib.list_local_devices()) 2023-11-18 23:24:20.284424: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:24:20.284485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: NVIDIA GeForce RTX 4090 major: 8 minor: 9 memoryClockRate(GHz): 2.595 pciBusID: 0000:01:00.0 2023-11-18 23:24:20.284536: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2023-11-18 23:24:20.284563: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2023-11-18 23:24:20.284572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2023-11-18 23:24:20.284579: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2023-11-18 23:24:20.284601: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2023-11-18 23:24:20.284610: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2023-11-18 23:24:20.284617: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2023-11-18 23:24:20.285073: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:24:20.285841: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:24:20.285957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2023-11-18 23:24:20.286044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-11-18 23:24:20.286059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2023-11-18 23:24:20.286065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2023-11-18 23:24:20.286503: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:24:20.286532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1409] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-11-18 23:24:20.286950: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:24:20.286996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 20404 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 2902699518032585644 , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 14833551317102300526 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 16321590660873785508 physical_device_desc: "device: XLA_CPU device" , name: "/device:GPU:0" device_type: "GPU" memory_limit: 21395249562 locality { bus_id: 1 links { } } incarnation: 10867835616343311674 physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9" ] >>> >>> import tensorflow as tf >>> tf.test.is_gpu_available() 2023-11-18 23:25:47.534386: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:25:47.534466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: NVIDIA GeForce RTX 4090 major: 8 minor: 9 memoryClockRate(GHz): 2.595 pciBusID: 0000:01:00.0 2023-11-18 23:25:47.534501: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2023-11-18 23:25:47.534532: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2023-11-18 23:25:47.534541: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2023-11-18 23:25:47.534549: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2023-11-18 23:25:47.534557: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2023-11-18 23:25:47.534564: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2023-11-18 23:25:47.534572: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2023-11-18 23:25:47.534885: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:25:47.535458: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:25:47.535504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2023-11-18 23:25:47.535532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-11-18 23:25:47.535554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2023-11-18 23:25:47.535562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2023-11-18 23:25:47.536209: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:25:47.536249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1409] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2023-11-18 23:25:47.536596: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node Your kernel may have been built without NUMA support. 2023-11-18 23:25:47.536659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 20404 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9) True >>> ``` 以上一堆訊息基本上就是 python3.6(~3.8) tensorflow1.14(1.15) 沒辦法在 RTX-4090上面執行，不管是host、conda、docker各種方式都試過，安裝都可以，執行就會報錯(tensorflow無法正確載入資料到GPU上面運算) 正常的如下，可以在RTX-4090執行的程式(版本) ``` root@84388d86dc13:/workspace/zi2zi-pytorch# python infer.py --experiment_dir experiment --batch_size 200 --src_font charset/cjk/GenYoGothicTW-EL-01.ttf --from_txt --src_txt experiment/713050b7.txt --resume 5700 Namespace(experiment_dir='experiment', start_from=0, gpu_ids=[0], image_size=256, L1_penalty=100, Lconst_penalty=15, Lcategory_penalty=1.0, embedding_num=40, embedding_dim=128, batch_size=200, lr=0.001, resume=5700, obj_path='./experiment/data/val/val.obj', input_nc=3, from_txt=True, src_txt='experiment/713050b7.txt', canvas_size=256, char_size=256, run_all_label=False, label=0, src_font='charset/cjk/GenYoGothicTW-EL-01.ttf', type_file='type/tw.txt') initialize network with normal initialize network with normal load model 5700 load model resume from 5700 src: 另依臨床試驗資料分析鄉鄒鄔鄕鄗鄘鄙鄞鄢鄧鄭鄯鄰鄱鄴鄹鄺酈酉酊酋酌配酒酗酣酥酩酪酬酴酵酷酸醃醇醉醋醒醜醞醣醫醬醮醱醴醹醺釀釁釅釆采釉釋里重野量釐金釗釘釙釜針釣釦釧釩釭釮釵鈁鈇鈉鈍鈐鈑鈔鈕鈜鈞鈣鈴鈷鈸鈹鈺鈽鈾鈿鉀鉄鉅鉉鉋鉍鉑鉗鉚鉛鉤鉦鉫鉸鉻鉼銀銅銍銑銓銖銘銜銥銧銨銬銲銳銷銹銻銼鋁鋅鋆鋇鋉鋌鋐鋒鋕鋤鋪鋭鋮鋰鋸鋼錄錐錕錘錙錚錞錠錡錢錤錦錨錩錫錬錮錯錳錶鍀鍇鍈鍊鍋鍍鍔鍚鍛鍞鍠鍥鍬鍰鍵鍹鍾鎂鎊鎌鎔鎖鎗鎘鎚鎢鎤鎧鎪鎬鎭鎮鎰鎳鎵鏃鏇鏈鏍鏑鏖鏗鏘鏜鏝鏞鏟鏡鏢鏤鏨鏮鏵鏽鐀鐃鐋鐘鐙鐛鐠鐡鐫鐮鐱鐲鐳鐵鐸鐺鐽鐿鑄鑌鑑鑒鑕鑛鑠鑣鑤鑩鑪鑫鑰鑲鑷鑼鑽鑾鑿長門閂閃閉開閎閏閑閒間閔閘閙閡関閣閤閥閨閩閭閱閻闆闈闊闋闌闐闔闕闖關闞闡闢阜阡阪阬阮阱防阻阿陀陂附陋陌降限陘陛陜陝陞陡院陣除陪陬陰陲陳陴陵陶陷陸陽隄隅隆隊隋隍階隔隕隘隙際障隣隧隨險隱隴隸隹隻雀雁雄雅集雇雉雊雋雌雍雒雕雖雙雛雜雞離難雨雩雪雯雰雲零雷雹電需霄霆震霈霉霍霎霏霑霓霖霙霜霞霤霧霪霰露霸霹霽霾靂靄靈青靖静靚靛靜非靠靡面靦靨革靴靶靼鞅鞋鞍鞏鞘鞠鞣鞦鞭鞽鞿韁韃韆韉韋韌韓韜韡韭音韵韶韹韻響頁頂頃項順須頊頌頎預頑頒頓頗領頜頡頣頤頥頫頭頰頴頷頸頹頻頼顆題額顎顏顓顔顗願顛類顥顧顫顯顰顱風颯颱颲颳颶颺颼飄飛食飢飧飩飪飭飯飲飴飼飽飾餃餅餉養餌餐餒餓餘餚餛餞餡館餮餵餽餾餿饅饉饑饒饕饗饜饞首香馡馥馨馬馭馮馱馳馴馷駁駐駑駒駕駙駛駝駟駢駭駱駿騁騎騏騖騙騫騰騵騷騾驀驃驅驊驍驕驗驚驛驟驢驥驪骨骯骰骷骸骼髁髏髑髒髓體髖高髡髦髭髮髯髻鬃鬆鬍鬚鬢鬣鬥鬧鬨鬮鬱鬲鬼魁魂魄魅魍魎魏魑魔魘魚魯魷鮑鮪鮫鮭鮮鮯鯀鯈鯉鯊鯓鯖鯛鯤鯧鯨鯽鰍鰓鰥鰭鰱鰲鰹鰻鰾鱉鱔鱖鱗鱟鱷鱸鲲鳥鳦鳩鳭鳯鳳鳴鳶鴃鴆鴉鴌鴒鴕鴛鴣鴦鴨鴻鴿鵑鵝鵠鵡鵪鵬鵲鶉鶯鶴鶸鷂鷄鷓鷗鷥鷹鷺鷿鸚鸛鸝鸞鹵鹹鹼鹽鹿麂麃麋麒麓麗麝麟麥麩麴麵麻麼麾黃黌黍黎黏黑黔默黛黜黝點黠黨黯黴黷鼇鼎鼓鼕鼙鼠鼬鼯鼴鼶鼷鼸鼻鼾齊齋齒齜齟齡齣齥齦齧齪齬齮齲齷龍龐龔龜龟弄凉﨑，：；？｛｝阿斯持捷利康C0VD*9疫苗接種須知51G1阿斯特接利康COVID-19疫苗(ChAdOx1-s)阿斯特捷利康(AstraZeneca)COVID-19疫苗是含有SARS-CoV2病毒棘蛋白(Sprotein)之非複製型腺病毒載體疫苗，用於預防COVD19。本疫苗已通過WHO、歐盟等先進國家及我國緊急授權使用，適用18歲以上，採2劑肌肉注射，並於臨床試驗中位數80天的追蹤期間證實可預防61%有症狀感染之風險'。另依臨床試驗資料分析，當接種間隔12週以上且完成2劑接種，保護力可達81%'。基此，我國衛生福利部傳染病防治諮詢會預防接種組｛A\P)建議兩劑間隔至少8週，而間隔1C-12週，疫苗接種效益更佳。】疫苗接種禁忌與接種前注意事項今接種禁忌：對於疫苗成分有嚴重過敏反應史、先前接種本項疫苗劑次曾發生嚴重過敏反應或血栓合併.:fgl?2!f9eze血小板低下症候群者，以及過去曾發生微血管滲漏症候群(Capillaryleaksyndrome,Cls).之病人，不予接種。令注意事項：1.阿斯特捷利豪(AstraZeneca)COV[D-19疫苗與注射後非常罕見的血栓併血小板低下症候群可能有關聯，接種前請與醫師討論評估相關風險後再接種2.過去曾發生血栓合併血小板低下症候群，或肝素引起之血小板低下症者，應避免接種。3.本疫苗不得與其他廠牌交替使用。若不慎接種了兩劑不同廠牌COVD-19疫苗時，不建議再接種任何一種產品。4目前尚無資料顯示與其他疫苗同時接種對免疫原性與安全性的影譽~COVD19疫苗與其他疫苗的接種間隔，建議間隔至少7天。如小於上述間隔，則各該疫苗亦無需再補種。5.發燒或正患有急性中重度疾病者，宜待病情穩定後再接種。6免疫功能低下者，包括接受免疫抑制劑治療的人，對疫苗的免疫反應可能減弱。(尚無免疫低下者或正在接受免疫抑制治療者的數據)7目前缺乏孕婦接種COVD-19疫苗之臨床試驗及安全性資料，而臨床觀察性研究顯示孕婦感染SARSCoV2病毒可能較一般人容易併發重症。孕婦若為COVD19之高職業暴露風險者或具慢性疾病而易導致重症者，可與醫師討論接種疫苗之效益與風險後，評估是否接種。8.若哺乳中的婦女為建議接種之風險對象(如醫事人員)，應完成接種。目前對哺乳中的婦女接種COVID-19疫苗的安全性、疫苗對母乳或受哺嬰兒之影響尚未完全得到評估，但一般認為並不會造成相關風險。接種COV[D-19疫苗後，仍可持繽哺乳。 cold start time: 17.06, hot start time 16.53 root@84388d86dc13:/workspace/zi2zi-pytorch# python train.py --experiment_dir experiment --batch_size=168 --lr=0.01 --epoch=500 --sample_steps=330 --schedule=5 --L1_penalty=1 --Lconst_penalty=1 args: Namespace(experiment_dir='experiment', gpu_ids=['cuda:0'], image_size=256, L1_penalty=1, Lconst_penalty=1, Lcategory_penalty=0.1, embedding_num=40, embedding_dim=128, epoch=500, batch_size=168, lr=0.01, schedule=5, freeze_encoder=False, fine_tune=None, inst_norm=False, sample_steps=330, checkpoint_steps=100, flip_labels=False, random_seed=555, resume=None, input_nc=3) initialize network with normal initialize network with normal unpickled total 87 examples unpickled total 898 examples Epoch: [ 0], [ 0/ 6] time: 2.70, d_loss: 1.66920, g_loss: 26.40519, category_loss: 0.16161, cheat_loss: 0.36592, const_loss: 0.19804, l1_loss: 23.24381 current lr: 0.01 0.01 Checkpoint: save checkpoint step 0 Sample: sample step 0 unpickled total 898 examples Epoch: [ 1], [ 0/ 6] time: 11.00, d_loss: 51.28028, g_loss: 14.05730, category_loss: 0.03875, cheat_loss: 0.00461, const_loss: 0.46742, l1_loss: 7.23550 current lr: 0.01 0.01 unpickled total 898 examples Epoch: [ 2], [ 0/ 6] time: 18.03, d_loss: 2.73312, g_loss: 11.38073, category_loss: 0.00008, cheat_loss: 0.95536, const_loss: 0.07953, l1_loss: 10.34584 current lr: 0.01 0.01 unpickled total 898 examples Epoch: [ 3], [ 0/ 6] time: 25.21, d_loss: 1.97000, g_loss: 16.05286, category_loss: 0.00000, cheat_loss: 6.16309, const_loss: 0.13572, l1_loss: 9.75405 current lr: 0.01 0.01 unpickled total 898 examples Epoch: [ 4], [ 0/ 6] time: 32.27, d_loss: 1.83300, g_loss: 9.91126, category_loss: 0.00143, cheat_loss: 1.26681, const_loss: 0.04670, l1_loss: 8.56654 current lr: 0.01 0.01 ``` # 2023/11/18 這下面的測試都失敗，實際跟想像的功能(架構)都不一樣根據 [Win11+WSL2+Ubuntu+Docker-Desktop 支持GPU的深度学习环境搭建](https://zhuanlan.zhihu.com/p/555151725) 實際執行時遇到的狀況說明 1.安裝 win 11 GPU driver. 新版的即可。 [CUDA on WSL User Guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html) ![image](https://hackmd.io/_uploads/HkoI6FNV6.png) 2.用 nvidia-smi 檢查驅動版本 3.安裝 win11 下的 cuda (略，但是版本要比將要裝在ubuntu的還新) 因為step 5如下圖 ![image](https://hackmd.io/_uploads/r1eL0KEVa.png) 4.(前面幾個步驟都跳過，參考原網頁) 5 安裝 ubuntu 這一步驟要注意(關係要不要一直打 sudo) wsl --list --online wsl --install Ubuntu-20.04 wsl --set-version Ubuntu-20.04 2 wsl --set-default-version 2 先安裝然後執行下面的命令變更成root Ubuntu安装完成后，需要设置账户密码，随便设置即可，后面将设置默认用户为root。以管理员身份运行power shell，执行以下命令，将Ubuntu的默认用户修改为root， ubuntu2004.exe config --default-user root 再在power shell，执行以下命令，将WSL2的默认系统设置为Ubuntu-20.04 wslconfig /setdefault Ubuntu-20.04 6 安裝 cudatoolkit sudo vim ~/.bashrc !!注意!! 還要裝 cudnn !!! ![image](https://hackmd.io/_uploads/r1t7-9NNa.png) ![image](https://hackmd.io/_uploads/ryawk9NVp.png) ![image](https://hackmd.io/_uploads/BkPTAFNE6.png) 5 Win11-安装Docker-Desktop，只需要在Docker官网中找到Docker-Desktop下载最新版本即可 6 安裝 pythorch image ![image](https://hackmd.io/_uploads/Hy0fxcEVp.png) sudo docker pull pytorch/pytorch:2.1.1-cuda12.1-cudnn8-runtime