--- tags: tensorflow, jupyter, docker author: N0-Ball topic: tensorflow + jupyter, docker GA: UA-208228992-1 --- # 簡介 由於不知道到底我會不會繼續弄tensorflow 不過因為實驗室需要,就裝了一個docker 的 jupyterhub + tensorflow 順手就來寫個安裝心得 ## 閱讀前知識 - 少許的docker, Dockerfile 跟 docker compose - 有一個github帳號 (Google 好麻煩 TBD) ## 須知 - 安裝OS: ubuntu 20.04 - GPU: Nvidia GForce 3080 - 此篇最近更新時間: 2021/09/23 # 安裝 docker ```bash sudo apt-get remove docker docker-engine docker.io containerd runc sudo apt-get update sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release \ gcc \ make curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo \ "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io ``` 總結就是 [官方網站](https://docs.docker.com/engine/install/ubuntu/) 複製貼上 ## 檢查 ``` sudo docker run hello-world ``` ![](https://i.imgur.com/hYjaYwK.png) 基本上看到上述圖片就是成功惹 ## 另解 ```bash= curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh ``` ## 有問題要uninstall ```bash= sudo apt-get purge docker-ce docker-ce-cli containerd.io sudo rm -rf /var/lib/docker sudo rm -rf /var/lib/containerd ``` ## 每次都要 sudo 好麻煩? 把創一個docker的group(有可能已經有惹) 然後把自己加進去就好惹 ```bash= sudo groupadd docker sudo usermod -aG docker $USER ``` # 最大的坑 - GPU driver :::warning 我的建議基本上有一步不小心弄錯了 直接整台重灌比較快,但我還是努力把至少我遇到的問題給提出並提供解決方式 ::: ## 找自己的GPU ``` lspci -v -s $(lspci | grep ' VGA ' | cut -d" " -f 1) ``` 沒有就... 可能是硬體上的問題 ## 安裝 driver [官網](https://developer.nvidia.com/cuda-downloads) 我建議是用這個寫好的runfile 一起把cuda也裝一裝 ![](https://i.imgur.com/WpS9LCr.png) - 基本上就 accept 然後 全選 就沒問題惹 :::danger 如果出現甚麼強烈建議不要裝,那就是出事惹 請先走下面那個[踩坑](#一堆坑) ::: 如果出現甚麼安裝失敗去看 `/var/log/cuda-installer.log` 如果看不出啥就是driver安裝出事了 ```bash [INFO]: Finished with code: 3840 (xxx) [ERROR]: Install of driver component failed. ``` 去看看 `/var/log/nvidia-installer.log` 如果看到 `nouveau` 恭喜你這篇可以幫你[解決問題](#nouveau) ### 測試 cuda 有沒有安裝成功 ```bash cd /usr/local/cuda-11.4/samples/4_Finance/BlackScholes make BlackScholes ./BlackScholes ``` ![](https://i.imgur.com/YV399o9.png) 基本上看到 `Test passed` 就知道應該是成功惹 ## 安裝給 docker 的 driver 首先先打開來看 ```bash . /etc/os-release; echo $ID$VERSION_ID ``` 去 [官網](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) 仔細看你的distribution有沒有在裡面 沒有就 downgrade 不然就賭可以向下兼容 :::warning ubuntu21.04沒有如果不小心upgrade可以硬把他改成20.04 ::: ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \ && sudo apt-get update -y \ && sudo apt-get install nivida-container-toolkit ``` ### Ubuntu21.01 ```bash distribution=ubuntu20.04 \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \ && sudo apt-get update -y \ && sudo apt-get install nivida-container-toolkit ``` ### 檢查 driver 有沒有安裝成功 ```bash sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi ``` 基本上沒有error有可以惹 ![](https://i.imgur.com/8cdapNq.png) ## 一堆坑 如果你不是以這篇開始的,一定有一堆前人會跟我conflict的東西 這邊基本上是建議直接重灌照這篇走應該不會錯,但是還是簡略的寫了一點採坑的心得。 :::info 基本上就是 遇到error -> uninstall -> 看要多做甚麼東西 -> 按照這篇重作 ::: ### uninstall driver 基本上就找出所有的nvidia 然後幹掉 ```bash sudo apt-get remove --purge '^nvidia-.*' sudo apt-get autoremove -y sudo apt-get autoclean ``` 再不行就把所有nvidia的東西幹掉 ```bash sudo apt-get remove --purge '.*nvidia.*' ``` 是說聽說如果你是ubuntu 桌面板 記得要把桌面裝回來 ```bash sudo apt-get install ubuntu-desktop ``` ### nouveau nouveau是一個預設裝的driver...吧 反正在裝nvidia的driver的時候會conflict 所以就把它幹掉 <!-- ```bash sudo rm /etc/X11/xorg.conf echo 'nouveau' | sudo tee -a /etc/modules sudo reboot ``` --> [參考這篇](https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver) # github oauth [官網教學](https://docs.github.com/en/developers/apps/building-oauth-apps/creating-an-oauth-app) ![](https://i.imgur.com/yvqxLjS.png) Homepage URL : `http://<ip>` Authorization callback URL : `http://<ip>/hub/oauth_callback` 這邊 ip 就要放你的 ip 或 domain name 例如 http://172.217.24.3 或是 http://google.com.tw 然後你就有client ID 跟 client secrets (沒有就按那個 generate new client secret) ![](https://i.imgur.com/jS56By8.png) 這兩個很重要,記起來 # Jupyterhub + tensorflow 前置作業 ```bash mkdir jupyter-docker && \ cd jupyter-docker && \ git clone https://github.com/jupyter/docker-stacks.git && \ touch docker-compose.yml && \ mkdir jupyterhub && \ cd jupyterhub && \ touch Dockerfile && touch jupyterhub_config.py ``` 弄好之後你的資料夾應該長這樣 ```bash . ├── docker-compose.yml ├── docker-stacks │ ├── tensorflow-notebook │ . │ . │ . │ └── xxx └── jupyterhub ├── Dockerfile └── jupyterhub_config.py ``` ## Dockerfile ```dockerfile= FROM jupyterhub/jupyterhub RUN pip install dockerspawner oauthenticator COPY jupyterhub_config.py . ``` 我是用 oauth 去接 github 的帳號所以要多裝一個oauthenticator ## jupyterhub_config.py ```python= import os c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner" c.DockerSpawner.image = os.environ["DOCKER_JUPYTER_IMAGE"] c.DockerSpawner.network_name = os.environ["DOCKER_NETWORK_NAME"] c.JupyterHub.hub_ip = os.environ["HUB_IP"] c.Authenticator.admin_users = {'<admin user>'} from oauthenticator.github import GitHubOAuthenticator c.JupyterHub.authenticator_class = GitHubOAuthenticator c.GitHubOAuthenticator.oauth_callback_url = \ 'http://<ip>/hub/oauth_callback' c.GitHubOAuthenticator.client_id = os.environ["GITHUB_CLIENT_ID"] c.GitHubOAuthenticator.client_secret = os.environ["GITHUB_CLIENT_SECRET"] notebook_dir = os.environ.get('DOCKER_NOTEBOOK_DIR') or '/home/jovyan/work' c.DockerSpawner.notebook_dir = notebook_dir # Mount the real user's Docker volume on the host to the notebook user's # notebook directory in the container c.DockerSpawner.volumes = { 'jupyterhub-user-{username}': notebook_dir, 'jupyterhub-shared': '/home/jovyan/work/shared', 'jupyterhub-data': '/home/jovyan/work/data' } c.DockerSpawner.remove_containers = True c.Spawner.default_url = '/' ``` 這邊要改的地方就是 - admin user - 這個就是admin的github帳號名子 - ip - 你server的 Ip ## docker-compose ```yaml= version: '3' services: jupyterhub: build: ./jupyterhub image: jupyterhub ports: - "80:8000" container_name: jupyterhub-container volumes: - /var/run/docker.sock:/var/run/docker.sock - jupyterhub_data:/srv/jupyterhub environment: DOCKER_JUPYTER_CONTAINER: jupyter-notebook DOCKER_JUPYTER_IMAGE: jupyter/tensorflow-notebook DOCKER_NETWORK_NAME: jupyter-docker_default GITHUB_CLIENT_ID: "${GITHUB_CLIENT_ID}" GITHUB_CLIENT_SECRET: "${GITHUB_CLIENT_SECRET}" HUB_IP: jupyterhub volumes: jupyterhub_data: ``` - 這邊看你是要把`GITHUB_CLIENT_ID`跟`GITHUB_CLIENT_SECRET`(就是剛剛github oauth拿到的東西) 直接放進去還是放在.env裡面他會自己裝進去 ```bash= GITHUB_CLIENT_ID=xxxxxxxxxxxxxxxxx GITHUB_CLIENT_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ``` ## 成果 ```bash docker-compose up -d ``` 然後就去 http://localhost:8000找找看你的jupyter有沒有成功運行吧 ![](https://i.imgur.com/5jWIAMB.png) ## 驗證 jupyter-tensorflow 先開一個console ```bash= pip install tensorflow_datasets ``` 再開一個kernal 照[官網](https://www.tensorflow.org/datasets/keras_example)的玩 ```python= import numpy as np import tensorflow as tf import tensorflow_datasets as tfds import matplotlib.pyplot as plt %matplotlib inline (ds_train, ds_test), ds_info = tfds.load( 'mnist', split=['train', 'test'], shuffle_files=True, as_supervised=True, with_info=True, ) def normalize_img(image, label): """Normalizes images: `uint8` -> `float32`.""" return tf.cast(image, tf.float32) / 255., label ds_train = ds_train.map( normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) ds_train = ds_train.cache() ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples) ds_train = ds_train.batch(128) ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE) ds_test = ds_test.map( normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) ds_test = ds_test.batch(128) ds_test = ds_test.cache() ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE) model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128,activation='relu'), tf.keras.layers.Dense(10) ]) model.compile( optimizer=tf.keras.optimizers.Adam(0.001), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[tf.keras.metrics.SparseCategoricalAccuracy()], ) model.fit( ds_train, epochs=6, validation_data=ds_test, ) ``` ![](https://i.imgur.com/j8Z2eaw.png) loss 變小就對啦