Tensorflow 建製教學 (on docker)

# Tensorflow 建製教學 (on docker) ``` OS: Ubuntu 18.04 (docker is also availiable on Windows) arch: x86_64 ``` [實測](https://blog.exxactcorp.com/is-docker-ideal-for-running-tensorflow-lets-measure-performance-with-rtx-2080-ti/)證實 tensorflow 在實體機與 docker 中擁有幾乎相同的效能。 ## 環境建製 ### docker 移除舊版 docker: ```shell $ sudo apt-get remove docker docker-engine docker.io containerd runc ``` 更新 APT 的 package index: ```shell $ sudo apt-get update ``` 安裝透過 HTTPS 安裝套件需要的相關工具： ```shell $ sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common ``` 將 Docker 官方的 GPG key 加入 `apt-key`: ```shell $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - ``` 將欲使用的 repo 加入 `add-apt-repository` 中： ```shell $ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" ``` 其中需注意 `arch` 需為 host 主機的架構。`stable` 為版本(另外還有 `nightly` 跟 `test`)。因為剛剛加入了 docker 的 repo 到 `APT` 中，所以我們現在要再更新一次 index： ```shell $ sudo apt-get update ``` 安裝 docker: ```shell $ sudo apt-get install docker-ce docker-ce-cli containerd.io ``` 驗證 docker 是否安裝成功： ```shell $ sudo docker run hello-world ``` 若成功會有明顯提示出現。注意，由於 docker 會讀寫 privileged directory，所以我們需要用 `sudo` 來提昇 docker 的權限，若想省去這道手續可以參考[這裡](https://docs.docker.com/install/linux/linux-postinstall/)。另外還需注意使用 `sudo` 開 docker 時，我們對共享資料夾寫入的檔案的 owner 會是 root，所以在讀寫上會比較不方便，若想避免這問題可以**加入**以下 option 讓 docker 的使用者改為 user： ```shell -u $(id -u):$(id -g) ``` 安裝成功後我們就可以開始下載對應的 docker image，CPU 與 GPU 的 image 挑選可以參考[這邊](https://www.tensorflow.org/install/docker#download_a_tensorflow_docker_image)，本教學使用 CPU 的 image 作為舉例。使用 CPU 的話我們使用以下 command 取得 image: ```shell $ docker pull tensorflow/tensorflow ``` 下載完後使用以下 command 驗證安裝是否成功: ```shell $ docker run -it --rm tensorflow/tensorflow \ python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" ``` 安裝成功後即可開始正常使用 container (其中的 `-it` 代表 `--interactive` + `--tty`，而 `bash` 即為開機後執行的第一個程式(非指 `init` 相關))： ```shell $ docker run -it tensorflow/tensorflow bash ``` #### 常用 option & 注意事項 - 以下介紹如何共享資料夾。以下 option 中 **冒號前**為 host 端的 directory，**冒號後**為 container 中的 directory： ```shell -v $PWD:/tmp ``` - 以下介紹==如何在 docker 中安裝 pip package==。首先請確認開啟 docker 的使用者是 root (因為待會要對 image 進行寫入)，接下來即可直接使用 pip command 安裝(e.g. ==pip install tensorflow_hub，這次教學會用到的 package==)，安裝完後我們要將這次安裝的套件(即對 image 做過的讀寫) commit 到 image 中(未 commit 前我們安裝的套件都會在 container 關機後消失)： ```shell # 取得 container id (假設當下只有一個 container 正在執行，若有多個 container 則由 container 的 hostname 去做對應即可) $ sudo docker ps -l # 假設 container id 為 8267719d5175，我們使用以下 command 做 commit (後面的 tensorflow 為 image 名稱) $ sudo docker commit 8267719d5175 tensorflow/tensorflow # commit 成功後會吐出該次 commit 的 sha256 雜湊值 (類似 Git commit)，以下為舉例： sha256:1765f6ebe65ed271e2f456c95f879c0c5996a44b4cc47ecb4aaaae5df971a9e0 ``` 如需安裝多個套件的話使用 [Dockerfile](https://docs.docker.com/engine/reference/builder/) 安裝會比較方便，同時也是較正規的安裝方式。 - 以下介紹如何指定 container 開機後的資料夾: (此為 option) ```shell -w $(path/to/you/intended) ``` ## 模組的訓練與驗證使用共享資料夾開機後即可開始使用 host 端提供的 dataset (老鼠、迷宮, etc.) 來訓練模型。訓練開始前我們要先拿到兩份來自 tensorflow 官方 GitHub 的檔案，分別是[訓練用](https://raw.githubusercontent.com/tensorflow/hub/master/examples/image_retraining/retrain.py)與[驗證模型用](https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/label_image/label_image.py)，建議直接用 `$ curl -O $(上面提供的網址)` 下載這兩份檔案。假設目前目錄下有 `retrain.py`，並且圖片放在 `image/老鼠、image/八臂` 中(注意，這邊的老鼠、八臂即待會訓練程式會幫我們建立的 label 的名稱)，我們使用以下 command 開始訓練模型： ```shell $ python retrain.py --bottleneck_dir=./bottlenecks --how_many_training_steps 2000 --model_dir=./inception_new --output_graph=./retrained_graph.pb --output_labels=./retrained_labels.txt --image_dir ./images ``` 注意，上述 command 中 `--how_many_training_steps` 為訓練次數，若不指定的話預設是 4000 次。另外，其中除了 `image/` 與 `retrain.py` 是使用者給定的資料外其餘都是訓練完成後生成的。訓練完成後，我們使用 `retrained_graph.pb` (訓練結束後生成的 model) 開始進行驗證： ```shell $ python label_image.py --image $(測試圖片之檔名) --labels ./retrained_labels.txt --graph ./retrained_graph.pb --input_layer Placeholder --output_layer final_result ``` 執行後預期會輸出給定的圖片經過 tensorflow 判斷後各種物件的 possibility，類似如下： ```shell ...(已省去運算過程) human 0.9206098 labrat 0.0627298 shoes 0.01332257 mazearm 0.0033379497 ``` 如欲在自己的 python 程式中使用此 model 僅需將 `label_image.py` 中的部份程式碼整入自己的程式碼中即可。另外，自己的 python 程式只要放入 container 的共享資料夾後即可開始執行(需注意自己的程式用到的 package 是否 container 中有安裝，如果沒有的話照[上面](https://hackmd.io/jNUE63zSRiG7mh3WAn4FGg?view#%E5%B8%B8%E7%94%A8-option-amp-%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A0%85)的教學安裝即可)。 ## 參考資料 [Docker installation](https://docs.docker.com/install/linux/docker-ce/ubuntu/) [Tensorflow docker installation](https://www.tensorflow.org/install/docker) [Tensorflow Inception](https://medium.com/@wingkwong/%E4%BD%BF%E7%94%A8tensorflow%E9%87%8D%E7%B7%B4inception-v3-%E5%BB%BA%E7%AB%8B%E5%9C%96%E5%83%8F%E5%88%86%E9%A1%9E%E5%99%A8-fab0552980eb) [Youtube 教學(版本過時)](https://www.youtube.com/watch?v=uPVwclm_AAM) [commit changes on docker](https://stackoverflow.com/questions/19585028/i-lose-my-data-when-the-container-exits) ###### tags: Tensorflow