Containerized GPU training on Windows Server 2019

# Containerized GPU training on Windows Server 2019 :::info **Windows 容器中的 GPU 加速**: 容器主機必須執行 Windows Server 2019 或 Windows 10 版本 1809 或更新版本。 ::: ## Windows Server 2019 版本 ![](https://i.imgur.com/ItMRnzD.png) Windows Server Standard跟Essentials 都有180days的評估版可以使用 https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2019-essentials * Standard直接下載官方提供的iso安裝即開始試用 * Essential則是官方有提供一組試用的產品金鑰 > NJ3X8-YTJRF-3R9J9-D78MF-4YBP4 其中，要讓Docker運行必須在Windows上啟用 The Containers feature :::danger ***但是在Essential版本中無法啟用Containers feature*** ::: ![](https://i.imgur.com/ysn1rTW.png) ## Install Docker Windows有兩種安裝Docker的方式: 1. Docker Desktop for Windows ***- both Linux and Windows containers on Windows*** The Docker Desktop installation includes Docker Engine, Docker CLI client, Docker Compose, Notary, Kubernetes, and Credential Helper. 2. Docker on Windows ***- Windows containers only*** with a common API and command-line interface (CLI) > #### 兩種在Windows上的容器 > 1. Linux Container > 2. Windows Container > :::warning > Windows Container 只支援特定OS版本以及必須啟用Container功能 > 而Linux Container由於只需啟用Hyper-V就能使用 > Docker Desktop 可以在兩個Container之間切換 defalut是Linux Container > 但Docker on Windows就只支援Windows Container 不能作切換 > 這也使得Windows Server 2019 Essential版本並不能正常運行及安裝Docker on Windows > 但卻可以安裝Docker Desktop for Windows > ::: --- ### Docker Desktop for Windows ![](https://i.imgur.com/WOQEtJg.png) 直接安裝即可內建設定、查看運行狀況、taskbar UI，也有一鍵開啟k8s功能 ![](https://i.imgur.com/0IIVqJ0.png) ![](https://i.imgur.com/oHB5QOt.png) 能夠切換Linux Container 或是 Windows Container ![](https://i.imgur.com/QQ5aPAi.png) （Linux containers & Windows containers只能管理各自的容器） --- ### Docker on Windows https://github.com/OneGet/MicrosoftDockerProvider https://docs.microsoft.com/zh-tw/virtualization/windowscontainers/deploy-containers/deploy-containers-on-server ![](https://i.imgur.com/qbGVoxA.png) **使用 OneGet 提供者 PowerShell 模組安裝 Docker** #### 安裝 OneGet PowerShell 模組 ```shell= Install-Module -Name DockerMsftProvider -Repository PSGallery -Force ``` #### 安裝 OneGet docker provider ```shell= Import-Module -Name DockerMsftProvider -Force Import-Packageprovider -Name DockerMsftProvider -Force ``` #### Install Docker Upgrade to the latest version of docker: ```shell= Install-Package -Name docker -ProviderName DockerMsftProvider -Verbose -Update ``` :::info Windows 容器中的 GPU 加速: 容器主機必須執行 Docker 引擎 19.03 或更新版本。 ::: --- ## Windows base image for containers https://hub.docker.com/_/microsoft-windows ![](https://i.imgur.com/jf0fk7w.png) :::info Windows 容器中的 GPU 加速: 容器基底映像必須是 mcr.microsoft.com/windows:1809 或更新版本。 ::: Windows Server 2019 Standard 評估版官方所提供的iso原生版本是***OS Build 17763.737*** 由於我們要使用1809版本的Windows Images 至少要10.0.17763.1397 這邊測試是upgrade到 ***OS Build 17763.1369*** 能夠正常運行 * *（以下測試皆使用windows:1809*） ```shell= docker pull mcr.microsoft.com/windows:1809 ``` > 另外還有三種不同的base image > [windows/iotcore](https://hub.docker.com/_/microsoft-windows-iotcore): Windows IoT Core base OS container image [windows/nanoserver](https://hub.docker.com/_/microsoft-windows-nanoserver): Nano Server base OS container image [windows/servercore](https://hub.docker.com/_/microsoft-windows-servercore): Windows Server Core base OS container Windows容器的Dockerfile僅支援以上四種base images 無法使用Linux類基礎映象檔 :::danger Windows容器中使用GPU加速並不支援 **Windows Server Core** 和 **Nano Server** 容器映像 ::: --- ## GPU Training Samples ### DirectX Container Sample > https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/master/windows-container-samples/directx :::info Windows 容器中的 GPU 加速: DirectX (以及以其為基礎的所有架構) 是唯一可以使用 GPU 來加速的 API。不支援第三方架構。 ::: 這個範例容器使用到WinMLRunner executable 並且用他的performance benchmarking mode去跑他會用假資料做一個ml model 100次，一開始用CPU，後來用GPU做測試，最後會產出報表跟一些performance metrics https://github.com/Microsoft/Windows-Machine-Learning/tree/master/Tools/WinMLRunner :::success 撰寫Dockerfile on Windows: #### 建立dockerfile 所建立的 Dockerfile 不能有副檔名。若要在 Windows 中這麼做，需使用自選的編輯器建立檔案，我自己測試是使用Notepad++，然後直接使用 ***Dockerfile*** 儲存該檔案。 ::: ``` FROM mcr.microsoft.com/windows:1809 WORKDIR C:/App # Download and extract the ONNX model to be used for evaluation. RUN curl.exe -o tiny_yolov2.tar.gz https://onnxzoo.blob.core.windows.net/models/opset_7/tiny_yolov2/tiny_yolov2.tar.gz && \ tar.exe -xf tiny_yolov2.tar.gz && \ del tiny_yolov2.tar.gz # Download and extract cli tool for evaluation .onnx model with WinML. RUN curl.exe -L -o WinMLRunner_x64_Release.zip https://github.com/microsoft/Windows-Machine-Learning/releases/download/1.2.1.1/WinMLRunner.v1.2.1.1.zip && \ tar.exe -xf C:/App/WinMLRunner_x64_Release.zip && \ del WinMLRunner_x64_Release.zip # Run the model evaluation when container starts. ENTRYPOINT ["C:/App/WinMLRunner v1.2.1.1/x64/WinMLRunner.exe", "-model", "C:/App/tiny_yolov2/model.onnx", "-terse", "-iterations", "100", "-perf"] ``` 接著回到cmd cd到剛檔案將該dockerfile build起來 ``` docker build . -t winml-runner ``` build完如果沒出錯就可run ``` docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 winml-runner ``` sample output: :::spoiler ``` .\WinMLRunner.exe -model SqueezeNet.onnx WinML Runner GPU: NVIDIA Tesla P4 Loading model (path = SqueezeNet.onnx)... ================================================================= Name: squeezenet_old Author: onnx-caffe2 Version: 9223372036854775807 Domain: Description: Path: SqueezeNet.onnx Support FP16: false Input Feature Info: Name: data_0 Feature Kind: Float Output Feature Info: Name: softmaxout_1 Feature Kind: Float ================================================================= Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS] Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS] Outputting results.. Feature Name: softmaxout_1 resultVector[818] has the maximal value of 1 Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS] Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor)...[SUCCESS] Outputting results.. Feature Name: softmaxout_1 resultVector[818] has the maximal value of 1 ``` ::: --- ### Tensorflow Directml Sample > https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows ::: warning tensorflow只支援64 bits Python 3.5 - 3.7 以及tensorflow需要msvcp140.dll這個元件解決方式是安裝Microsoft Visual C++ 2015 Redistributable Update 3 範例中用的python檔放在dockerfile同一個directory中 ::: :::success 撰寫Dockerfile on Windows: #### PowerShell Cmdlet 撰寫Dockerfile on Windows: 可以使用PowerShell Cmdlet在具有 RUN 作業的 Dockerfile 中執行。 ```shell= RUN powershell.exe -Command ``` #### 逸出字元預設的 Dockerfile 逸出字元為反斜線 \ 不過因為反斜線也是 Windows 中的檔案路徑分隔符號，所以使用反斜線來跨越多行可能會造成問題。所以在Windows中可以使用兩種方式做斷行： \ 及 ` ::: DockerFile: ``` FROM mcr.microsoft.com/windows:1809 # assign work directory WORKDIR /python # move all files to work directory including test.py COPY . /python # Silent Install Microsoft Visual C++ 2015 Redistributable Update 3 RUN powershell.exe -Command \ wget https://download.microsoft.com/download/9/3/F/93FCF1E7-E6A4-478B-96E7-D4B285925B00/vc_redist.x64.exe -OutFile vc_redist.x64.exe ; \ Start-Process vc_redist.x64.exe -ArgumentList '/q /norestart' -Wait Remove-Item vc_redist.x64.exe -Force # Silent Install Python 3.6.1 64bits RUN powershell.exe -Command \ $ErrorActionPreference = 'Stop'; \ [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; \ wget https://www.python.org/ftp/python/3.6.1/python-3.6.1rcl-amd64.exe -OutFile python-3.6.1rcl-amd64.exe ; \ Start-Process python-3.6.1rcl-amd64.exe -ArgumentList '/quiet InstallAllUsers=1 PrependPath=1' -Wait ; \ Remove-Item python-3.6.1rcl-amd64.exe -Force RUN pip install tensorflow-directml # -u to insure python print is working CMD ["py", "-u", "test.py"] ``` >**Install Python via command line/powershell without UI (quietly/slient install python)** 透過cmd以無UI的方式安裝Python 選擇版本:https://www.python.org/ftp/python/ 指令如下： >``` >Net.ServicePointManager]::SecurityProtocol = >[Net.SecurityProtocolType]::Tls12 >wget https://www.python.org/ftp/python/[version].exe >-OutFile c:\[version].exe >Start-Process c:\[version].exe -ArgumentList '/quiet >InstallAllUsers=1 PrependPath=1' >``` test.py中的內容只是用來測試tensorflow是否成功安裝 ```python= import tensorflow.compat.v1 as tf tf.enable_eager_execution(tf.ConifProto(log_device_placement=True)) print(tf.add([1.0, 2.0], [3.0, 4.0])) ``` ``` docker build -t tensorflow-directml . ``` ``` docker run -it tensorflow-directml ``` result: ```python 2020-07-23 20:06:09.756930: I tensorflow/core/common_runtime/dml/dml_device_factory.cc:45] DirectML device enumeration: found 1 compatible adapters. 2020-07-23 20:06:09.917532: I tensorflow/core/common_runtime/dml/dml_device_factory.cc:32] DirectML: creating device on adapter 0 (Microsoft Basic Render Driver) 2020-07-23 20:06:09.433379: I tensorflow/stream_executor/platform/default/dso_loader.cc:60] Successfully opened dynamic library DirectMLba106a7c621ea741d2159d8708ee581c11918380.dll 2020-07-23 20:06:09.558039: I tensorflow/core/common_runtime/eager/execute.cc:571] Executing op Add in device /job:localhost/replica:0/task:0/device:DML:0 tf.Tensor([4. 6.], shape=(2,), dtype=float32) ``` 已經包好push到Docker hub https://hub.docker.com/r/msxlol/tensorflow-directml-sample *How to Use* ``` docker run msxlol/tensorflow-directml ``` :::danger 在跑的過程中發現是可以detect到GPU 但總是只能抓到**Microsoft Basic Render Driver** 而不是實際上要使用到的**nvidia Tesla P4** ::: 最後解決方案是使用[DDA(Discrete Device Assignment)](https://docs.microsoft.com/zh-tw/windows-server/virtualization/hyper-v/deploy/deploying-graphics-devices-using-dda) 將整個 PCIe 裝置傳遞至 VM VM上安裝Centos 7與Tesla P4驅動就能detect到正確的Tesla P4而不是Microsoft Basic Render Driver ![](https://i.imgur.com/IWaxfAg.png) https://docs.microsoft.com/zh-tw/windows-server/virtualization/hyper-v/deploy/deploying-graphics-devices-using-dda https://docs.microsoft.com/zh-tw/windows-server/virtualization/hyper-v/plan/plan-for-gpu-acceleration-in-windows-server >**(DDA) 的離散裝置指派** 離散裝置指派 (DDA) （也稱為 GPU 傳遞）可將一或多個實體 Gpu 專用於虛擬機器。在 DDA 部署中，虛擬化工作負載會在原生驅動程式上執行，而且通常會擁有 GPU 功能的完整存取權。 DDA 提供最高層級的應用程式相容性和潛在的效能。硬體需求 - PCI Express Native Power Management - 啟動SR-IOV ![](https://i.imgur.com/Vc8SyMW.png) 確認GPU可以被掛載 ![](https://i.imgur.com/u4hmQyd.png) 也可以從Device Manager取得裝置路徑 ![](https://i.imgur.com/z9BZLfV.png) 使用Hyper V建立VM命名為TestGPU，參考以下設定進行即可設定完成後，GPU由TestGPU獨占 ```sh= Set-VM -Name TestGPU -AutomaticStopAction TurnOff Set-VM -GuestControlledCacheTypes $true -VMName TestGPU Set-VM -LowMemoryMappedIoSpace 3Gb -VMName TestGPU Set-VM -HighMemoryMappedIoSpace 33280Mb -VMName TestGPU Dismount-VMHostAssignableDevice -LocationPath "PCIROOT(0)#PCI(0200)#PCI(0000)" Add-VMAssignableDevice -LocationPath "PCIROOT(0)#PCI(0200)#PCI(0000)" -VMName TestGPU ``` 接下來如同linux使用GPU相同，需設定相關驅動程式 ```sh= lshw -C display # check GPU # install nvidia cuda yum -y install gcc kernel-devel kernel-headers pkgconfig yum -y upgrade kernel wget http://us.download.nvidia.com/tesla/410.129/NVIDIA-Linux-x86_64-410.129-diagnostic.run modprobe -b -r nouveau # disable nouveau for gpu ./NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-x-check -no-opengl-files # install nvidia cudnn wget http://developer.download.nvidia.com/compute/redist/cudnn/v7.6.5/cudnn-10.0-linux-x64-v7.6.5.32.tgz # echo "28355e395f0b2b93ac2c83b61360b35ba6cd0377e44e78be197b6b61b4b492ba cudnn-10.0-linux-x64-v7.6.5.32.tgz" | sha256sum -c - tar -zxf cudnn-10.0-linux-x64-v7.6.5.32.tgz tar --no-same-owner -xzf cudnn-10.0-linux-x64-v7.6.5.32.tgz -C /usr/local --wildcards 'cuda/lib64/libcudnn.so.*' ldconfig yum -y install gcc-c++ python3-devel pip3 install tensorflow-gpu python3 -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()" ```