影像辨識 - HackMD

# 影像辨識 ## 利用ResNet製作影像辨識 ```python= import numpy as np import pandas as pd import matplotlib.pyplot as plt from tensorflow.keras.applications import ResNet50 from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow.keras.preprocessing.image import load_img, img_to_array from urllib.request import urlretrieve urlretrieve("https://rawcdn.githack.com/MaxWutw/Deep-Learning/a9bbc7ed859d16ebc782f3bbde5bd2e1c65073dc/Image%20recognition/type.txt", "classes.txt") photo = [] for i in range(1,10): urlretrieve(f"https://github.com/MaxWutw/Deep-Learning/raw/main/ResNet/photo{i}.jpg", f"photo{i}.jpg") photo.append(f"photo{i}.jpg") store = [] for i in range(0,9): img = load_img(photo[i], target_size = (224,224)) x = img_to_array(img) store.append(x) resnet = ResNet50() with open('classes.txt') as f: labels = [line.strip() for line in f.readlines()] for i in range(0, 9): plt.figure(i) plt.axis('off') plt.imshow(store[i]/255) store[i] = store[i].reshape(1, 224, 224, 3) inp = preprocess_input(store[i]) [k] = np.argmax(resnet.predict(inp), axis=-1) tex = labels[k].split(' ') plt.text(15, 220,f"ResNet judge: {tex[0]}", fontsize = 15) plt.show() print(f"ResNet 覺得是 {labels[k]}") ``` ### 輸出 ![](https://i.imgur.com/TWYARYp.png) ![](https://i.imgur.com/YIUVo5P.png) ### 程式介紹首先引入基本的套件，由於會使用到ResNet，所以這邊要引入keras.applications底下的ResNet50，其實還有V2版本，但這邊就使用原來版本的，而ResNet50在做判斷前會將圖片做前置處理，所以這邊引入preprocess_input，再來就是讀入圖片並將圖片轉成array，所以這邊要引入load_img,和img_to_array，最後urlretrieve是為了將網上的資料下載下來。先從我的github下載ResNet判斷1000種類別的清單，並將此清單命名為classes.txt，接著從我的github上下載範例圖片，並將它添加進入一個list，再來將圖片轉成arrayㄝ並將此資料加入另一個list，這裡加入RestNet50當我們的神經網路，再來將剛才classes.txt的資料提取並加入laels裡，最後我們用迴圈將每個圖片都送進去ResNet50裡，並做預測，這樣即完成此次圖像辨識。 ## 利用遷移學習打造影像辨識 ### 程式碼（此部分程式碼是利用電腦裡自有的照片） ```python= import numpy as np import pandas as pd import matplotlib.pyplot as plt from tensorflow.keras.utils import to_categorical from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow.keras.applications import ResNet50 from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.preprocessing.image import load_img, img_to_array tmp = [] for i in range(1,10): tmp.append(f"photo{i}.jpg") data = [] for i in range(0,9): img = load_img(tmp[i], target_size = (256,256)) x = img_to_array(img) data.append(x) data = np.asarray(data, dtype = np.uint8) target = np.array([1,1,1,2,2,2,3,3,3]) x_train = preprocess_input(data) plt.axis('off') n = 1 plt.imshow(x_train[n]) y_train = to_categorical(target-1, 3) resnet = ResNet50(include_top=False, pooling="avg") resnet.trainable = False model = Sequential() model.add(resnet) model.add(Dense(3, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary() model.fit(x_train, y_train, batch_size=9, epochs=25) y_predict = np.argmax(model.predict(x_train), -1) labels=["麻雀", "喜鵲", "白頭翁"] print(y_predict) print(target-1) testing = [] pho = [] for i in range(1,4): pho.append(f"test{i}.jpg") for i in range(0,3): img = load_img(pho[i], target_size = (256,256)) x = img_to_array(img) testing.append(x) testing = np.asarray(testing, dtype = np.uint8) testing_data = preprocess_input(testing) final = np.argmax(model.predict(testing_data), -1) for i in range(3): print(f"{i+1}. CNN judge: ", labels[final[i]]) print("Answer : 麻雀、白頭翁、喜鵲") ``` ### 輸出 ![](https://i.imgur.com/G5D8vhu.png) <br> ![](https://i.imgur.com/NZosjpE.png) <br> ![](https://i.imgur.com/UXI5kFp.png) ### 程式介紹這裡是使用遷移學習，我們會運用ResNet做好的訓練，將ResNet加入我們的神經網路，但是不讓ResNet重新訓練，而是讓它照著之前的訓練經驗來判斷我們的圖片，所以我們不需要大批的數據就能達到效果，簡單的說就是運用ResNet舊有的經驗判斷它沒看過的圖。這邊一樣會引入上一個ResNet需要用到的套件，而這邊會多加幾樣架設神經網路需要用到的套件，因為我們只要ResNet的經驗，實際上我們是將ResNet加入我們的神經網路，成為其中一個份子。首先是讀入相片，這邊是讀入電腦中的圖篇，所以記得將程式碼和照片放在同一個目錄，不然就要加上準確路徑，再來將我們讀入的圖片轉成array。後面的target是我們的解答，這個地方可以依據數據的不同而進行更改，再來就是做照片的前置處理，至於這邊的y_train，其實就是我們的答案，只是轉成one-hot encoding，接著我們將ResNet50的網路取出，但這邊要記得加入第一個參數，由於我們是要訓練我們的資料，所以就將那1000種種類的那一層刪去，而第二個參數代表ResNet50經過平均池化再做回傳，resnet.trainable = False是說我們不要再讓ResNet50再重新訓練，因為ResNet50非常龐大，而且我們也不需要。再來就是建造我們的神經網路，第一層加入剛剛已經提取的resnet，第二層加入Dense層，因為我們要做fully connected，再來進行compile，這裡的loss使用categorical_crossentropy，optimization是使用adam，adam相對於SGD快上許多，也同時比較穩定，最後進行fit訓練。再來我們將剛才訓練的資料當成第一筆測試資料，並將它們和實際答案比對，來看錯在哪，但是單純使用訓練資料當測試資料很不安全，所以我最後再從我電腦中加入幾張網路上找的圖片，最後將結果輸出就是我們要的結果。 ## 利用遷移學習打造影像辨識在Gradio上執行 ### 程式碼（此部分會利用作者github的圖片訓練） ```python= import numpy as np import pandas as pd import matplotlib.pyplot as plt from urllib.request import urlretrieve from tensorflow.keras.utils import to_categorical from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow.keras.applications import ResNet50 from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.preprocessing.image import load_img, img_to_array import gradio as gr tmp = [] for i in range(1,10): urlretrieve(f"https://github.com/MaxWutw/Deep-Learning/raw/main/Image%20recognition/photo{i}.jpg", f"photo{i}.jpg") tmp.append(f"photo{i}.jpg") data = [] for i in range(0,9): img = load_img(tmp[i], target_size = (256,256)) x = img_to_array(img) data.append(x) data = np.asarray(data, dtype = np.uint8) target = np.array([1,1,1,2,2,2,3,3,3]) x_train = preprocess_input(data) plt.axis('off') n = 1 plt.imshow(x_train[n]) y_train = to_categorical(target-1, 3) y_train[n] resnet = ResNet50(include_top=False, pooling="avg") resnet.trainable = False model = Sequential() model.add(resnet) model.add(Dense(3, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary() model.fit(x_train, y_train, batch_size=9, epochs=25) y_predict = np.argmax(model.predict(x_train), -1) labels=["麻雀", "喜鵲", "白頭翁"] print(y_predict) print(target-1) testing = [] pho = [] for i in range(1,4): urlretrieve(f"https://github.com/MaxWutw/Deep-Learning/raw/main/Image%20recognition/test{i}.jpg", f"test{i}.jpg") pho.append(f"test{i}.jpg") for i in range(0,3): img = load_img(pho[i], target_size = (256,256)) x = img_to_array(img) testing.append(x) testing = np.asarray(testing, dtype = np.uint8) testing_data = preprocess_input(testing) final = np.argmax(model.predict(testing_data), -1) for i in range(3): print(f"{i+1}. CNN judge: ", labels[final[i]]) print("Answer : 麻雀、白頭翁、喜鵲") def classify_image(inp): inp = inp.reshape((-1, 256, 256, 3)) inp = preprocess_input(inp) prediction = model.predict(inp).flatten() return {labels[i]: float(prediction[i]) for i in range(3)} image = gr.inputs.Image(shape=(256, 256), label="鳥類照片") label = gr.outputs.Label(num_top_classes=3, label="AI辨識結果") gr.Interface(fn=classify_image, inputs=image, outputs=label, title="AI 三種鳥類辨識機", description="我能辨識台灣常見的三種鳥類: 麻雀、喜鵲、白頭翁。", capture_session=True).launch() ``` ### Gradio interface ![](https://i.imgur.com/gNDEQEr.png) ### 這邊不做程式介紹，因為和前一個幾乎一樣