---
tags: tensorflow, jupyter, docker
author: N0-Ball
topic: tensorflow + jupyter, docker
GA: UA-208228992-1
---
# 簡介
由於不知道到底我會不會繼續弄tensorflow
不過因為實驗室需要,就裝了一個docker 的 jupyterhub + tensorflow
順手就來寫個安裝心得
## 閱讀前知識
- 少許的docker, Dockerfile 跟 docker compose
- 有一個github帳號 (Google 好麻煩 TBD)
## 須知
- 安裝OS: ubuntu 20.04
- GPU: Nvidia GForce 3080
- 此篇最近更新時間: 2021/09/23
# 安裝 docker
```bash
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release \
gcc \
make
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
```
總結就是 [官方網站](https://docs.docker.com/engine/install/ubuntu/) 複製貼上
## 檢查
```
sudo docker run hello-world
```

基本上看到上述圖片就是成功惹
## 另解
```bash=
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
```
## 有問題要uninstall
```bash=
sudo apt-get purge docker-ce docker-ce-cli containerd.io
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
```
## 每次都要 sudo 好麻煩?
把創一個docker的group(有可能已經有惹)
然後把自己加進去就好惹
```bash=
sudo groupadd docker
sudo usermod -aG docker $USER
```
# 最大的坑 - GPU driver
:::warning
我的建議基本上有一步不小心弄錯了 直接整台重灌比較快,但我還是努力把至少我遇到的問題給提出並提供解決方式
:::
## 找自己的GPU
```
lspci -v -s $(lspci | grep ' VGA ' | cut -d" " -f 1)
```
沒有就...
可能是硬體上的問題
## 安裝 driver
[官網](https://developer.nvidia.com/cuda-downloads)
我建議是用這個寫好的runfile 一起把cuda也裝一裝

- 基本上就 accept 然後 全選 就沒問題惹
:::danger
如果出現甚麼強烈建議不要裝,那就是出事惹 請先走下面那個[踩坑](#一堆坑)
:::
如果出現甚麼安裝失敗去看 `/var/log/cuda-installer.log` 如果看不出啥就是driver安裝出事了
```bash
[INFO]: Finished with code: 3840 (xxx)
[ERROR]: Install of driver component failed.
```
去看看 `/var/log/nvidia-installer.log` 如果看到 `nouveau` 恭喜你這篇可以幫你[解決問題](#nouveau)
### 測試 cuda 有沒有安裝成功
```bash
cd /usr/local/cuda-11.4/samples/4_Finance/BlackScholes
make BlackScholes
./BlackScholes
```

基本上看到 `Test passed` 就知道應該是成功惹
## 安裝給 docker 的 driver
首先先打開來看
```bash
. /etc/os-release; echo $ID$VERSION_ID
```
去 [官網](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) 仔細看你的distribution有沒有在裡面
沒有就 downgrade 不然就賭可以向下兼容
:::warning
ubuntu21.04沒有如果不小心upgrade可以硬把他改成20.04
:::
```bash
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
&& sudo apt-get update -y \
&& sudo apt-get install nivida-container-toolkit
```
### Ubuntu21.01
```bash
distribution=ubuntu20.04 \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list \
&& sudo apt-get update -y \
&& sudo apt-get install nivida-container-toolkit
```
### 檢查 driver 有沒有安裝成功
```bash
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
基本上沒有error有可以惹

## 一堆坑
如果你不是以這篇開始的,一定有一堆前人會跟我conflict的東西
這邊基本上是建議直接重灌照這篇走應該不會錯,但是還是簡略的寫了一點採坑的心得。
:::info
基本上就是 遇到error -> uninstall -> 看要多做甚麼東西 -> 按照這篇重作
:::
### uninstall driver
基本上就找出所有的nvidia 然後幹掉
```bash
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get autoremove -y
sudo apt-get autoclean
```
再不行就把所有nvidia的東西幹掉
```bash
sudo apt-get remove --purge '.*nvidia.*'
```
是說聽說如果你是ubuntu 桌面板 記得要把桌面裝回來
```bash
sudo apt-get install ubuntu-desktop
```
### nouveau
nouveau是一個預設裝的driver...吧
反正在裝nvidia的driver的時候會conflict
所以就把它幹掉
<!--
```bash
sudo rm /etc/X11/xorg.conf
echo 'nouveau' | sudo tee -a /etc/modules
sudo reboot
```
-->
[參考這篇](https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver)
# github oauth
[官網教學](https://docs.github.com/en/developers/apps/building-oauth-apps/creating-an-oauth-app)

Homepage URL
: `http://<ip>`
Authorization callback URL
: `http://<ip>/hub/oauth_callback`
這邊 ip 就要放你的 ip 或 domain name
例如 http://172.217.24.3 或是 http://google.com.tw
然後你就有client ID 跟 client secrets (沒有就按那個 generate new client secret)

這兩個很重要,記起來
# Jupyterhub + tensorflow
前置作業
```bash
mkdir jupyter-docker && \
cd jupyter-docker && \
git clone https://github.com/jupyter/docker-stacks.git && \
touch docker-compose.yml && \
mkdir jupyterhub && \
cd jupyterhub && \
touch Dockerfile && touch jupyterhub_config.py
```
弄好之後你的資料夾應該長這樣
```bash
.
├── docker-compose.yml
├── docker-stacks
│ ├── tensorflow-notebook
│ .
│ .
│ .
│ └── xxx
└── jupyterhub
├── Dockerfile
└── jupyterhub_config.py
```
## Dockerfile
```dockerfile=
FROM jupyterhub/jupyterhub
RUN pip install dockerspawner oauthenticator
COPY jupyterhub_config.py .
```
我是用 oauth 去接 github 的帳號所以要多裝一個oauthenticator
## jupyterhub_config.py
```python=
import os
c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner"
c.DockerSpawner.image = os.environ["DOCKER_JUPYTER_IMAGE"]
c.DockerSpawner.network_name = os.environ["DOCKER_NETWORK_NAME"]
c.JupyterHub.hub_ip = os.environ["HUB_IP"]
c.Authenticator.admin_users = {'<admin user>'}
from oauthenticator.github import GitHubOAuthenticator
c.JupyterHub.authenticator_class = GitHubOAuthenticator
c.GitHubOAuthenticator.oauth_callback_url = \
'http://<ip>/hub/oauth_callback'
c.GitHubOAuthenticator.client_id = os.environ["GITHUB_CLIENT_ID"]
c.GitHubOAuthenticator.client_secret = os.environ["GITHUB_CLIENT_SECRET"]
notebook_dir = os.environ.get('DOCKER_NOTEBOOK_DIR') or '/home/jovyan/work'
c.DockerSpawner.notebook_dir = notebook_dir
# Mount the real user's Docker volume on the host to the notebook user's
# notebook directory in the container
c.DockerSpawner.volumes = {
'jupyterhub-user-{username}': notebook_dir,
'jupyterhub-shared': '/home/jovyan/work/shared',
'jupyterhub-data': '/home/jovyan/work/data'
}
c.DockerSpawner.remove_containers = True
c.Spawner.default_url = '/'
```
這邊要改的地方就是
- admin user
- 這個就是admin的github帳號名子
- ip
- 你server的 Ip
## docker-compose
```yaml=
version: '3'
services:
jupyterhub:
build: ./jupyterhub
image: jupyterhub
ports:
- "80:8000"
container_name: jupyterhub-container
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- jupyterhub_data:/srv/jupyterhub
environment:
DOCKER_JUPYTER_CONTAINER: jupyter-notebook
DOCKER_JUPYTER_IMAGE: jupyter/tensorflow-notebook
DOCKER_NETWORK_NAME: jupyter-docker_default
GITHUB_CLIENT_ID: "${GITHUB_CLIENT_ID}"
GITHUB_CLIENT_SECRET: "${GITHUB_CLIENT_SECRET}"
HUB_IP: jupyterhub
volumes:
jupyterhub_data:
```
- 這邊看你是要把`GITHUB_CLIENT_ID`跟`GITHUB_CLIENT_SECRET`(就是剛剛github oauth拿到的東西) 直接放進去還是放在.env裡面他會自己裝進去
```bash=
GITHUB_CLIENT_ID=xxxxxxxxxxxxxxxxx
GITHUB_CLIENT_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
## 成果
```bash
docker-compose up -d
```
然後就去 http://localhost:8000找找看你的jupyter有沒有成功運行吧

## 驗證 jupyter-tensorflow
先開一個console
```bash=
pip install tensorflow_datasets
```
再開一個kernal
照[官網](https://www.tensorflow.org/datasets/keras_example)的玩
```python=
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
%matplotlib inline
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
model.fit(
ds_train,
epochs=6,
validation_data=ds_test,
)
```

loss 變小就對啦