ロボットビジョン期末レポート

###### tags: `Class` # ロボットビジョン期末レポート（公開版） ## 概要このレポートは愛知県立大学情報科学科メディアロボティクスコースの授業ロボットビジョンで期末課題として提出したレポートを公開のため一部編集したものです。題目は「標識を撮影した画像に対する最適な二値化手法の提案」です。 ## 使用するデータセットの用意 fiftyoneというライブラリを用いてGoogle Open Images Dataset V6からTraffic sign (交通標識)クラスに分類された画像データ100件をランダムに抽出する。なお、open images dataset v6のライセンスはCC-4.0であり、適切なクレジットを表示し、ライセンスへのリンクを提供し、変更があったらその旨を示すことで利用が可能である。プログラムを作成する際、言語はPythonを用いた。作成したプログラムは以下の通りである。 >[Open Images Dataset V6 + Extensions](https://storage.googleapis.com/openimages/web/index.html) >[fiftyone](https://voxel51.com/docs/fiftyone/) >[Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/deed.ja) ```python= import fiftyone as fo import fiftyone.zoo as foz import time dataset = foz.load_zoo_dataset( "open-images-v6", split="train", label_types=["segmentations"], classes=["Traffic sign"], max_samples=100, shuffle=True, ) # fiftyoneを用いてopen-images-v6にデータセットを要求 session = fo.launch_app(dataset) # セッションが終了するまで待機 session.wait() ``` >[FiftyOne を使って Open Images Dataset のデータを取得する](https://qiita.com/RyoWakabayashi/items/ffcf21558855f6d5a9be) >[はじめての Google Open Images Dataset V6](https://qiita.com/kenichiro-yamato/items/e0c0d6f6138b1c64acd0) ## データセットに対する前処理今後の処理を簡単にするため、ダウンロードしたデータセットの画像を001から始まる番号で連番リネームする。プログラムを作成する際、言語はPythonを用いた。作成したプログラムは以下の通りである。 ```python= import os import glob files = glob.glob("./data/*.jpg") for i, old_name in enumerate(files): new_name = "./data/{0:03d}.jpg".format(i + 1) os.rename(old_name, new_name) print(old_name + "→" + new_name) ``` さらに、データセットに含まれる画像はすべてカラー画像であるため、白黒画像に変換する。プログラムを作成する際、言語はPythonを用いた。作成したプログラムは以下の通りである。 ```python= import os import glob import cv2 os.makedirs('./graydata', exist_ok=True) files = glob.glob("./data/*.jpg") for i, old_name in enumerate(files): flie_name = "./data/{0:03d}.jpg".format(i + 1) img = cv2.imread(flie_name) img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.imwrite('./graydata/{0:03d}.jpg'.format(i + 1), img_gray) ``` ## データセットの画像に対する二値化処理今回の実験ではpタイル法、判別分析法、モード法、微分ヒストグラム法の三つの手法を用いて画像を二値化し、その結果に対してスコアを付ける。以下に各手法の実装を示す。 ### pタイル法 pタイル法は本来、二値化の対象図形のおおよその面積が既知のときによい結果を得られる方法である。今回は他の方法との比較のため、対象図形の面積を画像内の5%として二値化を行う。プログラムを作成する際、言語はPythonを用いた。作成したプログラムは以下の通りである。このプログラムではヒストグラムの累積分布を求め、輝度値の小さい方から累積分布の値を確認し0.05を超えた場合にその輝度値をしきい値とする。 ```python= import os import numpy as np import glob import cv2 os.makedirs('./Ptile', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): image = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) histogram, bins = np.histogram(image.ravel(), 256, [0, 256], density=True) cdf = np.cumsum(histogram) for x, p in enumerate(cdf): if p > 0.05: threshold = x break print(f'frame:{old_name}, threshold: {threshold}') ret, binary_image = cv2.threshold(image, threshold, 255, cv2.THRESH_BINARY) cv2.imwrite('./Ptile/{0:03d}.jpg'.format(i + 1), binary_image) ``` ### 判別分析法判別分析法は輝度ヒストグラムに明確な谷がある場合はもちろんない場合にもある程度有効なしきい値を決定することができるため広く利用されている方法である。プログラムを作成する際、言語はPythonを用いた。作成したプログラムは以下の通りである。このプログラムでは、ある輝度値でヒストグラムを2つのクラスに分割したときに，クラス内分散が最も小さくなる輝度値をしきい値として二値化を行う。 ```python= import os import numpy as np import glob import cv2 os.makedirs('./Otsu', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): image = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) histogram, bins = np.histogram(image.ravel(), 256, [0, 256], density=True) threshold, binary_image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU) print(f'threshold: {threshold}') print(f'frame:{old_name}, threshold: {threshold}') ret, binary_image = cv2.threshold(image, threshold, 255, cv2.THRESH_BINARY) cv2.imwrite('./Otsu/{0:03d}.jpg'.format(i + 1), binary_image) ``` ### モード法モード法は画像中の対象図形と背景の輝度の差が大きいときによい結果を得られる方法である。プログラムを作成する際、言語はPythonを用いた。このプログラムでは画像の輝度ヒストグラムからヒストグラムの谷のうち最も深い谷を求め、その点のx 座標（輝度値）をしきい値とすることで二値化を行う。 ```python= import os import numpy as np import glob import cv2 os.makedirs('./mode', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): image = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) histogram, bins = np.histogram(image.ravel(), 256, [0, 256], density=True) threshold = histogram[30:226].argmin() + 30 print(f'threshold: {threshold}') ret, binary_image = cv2.threshold(image, threshold, 255, cv2.THRESH_BINARY) print(f'frame:{old_name}, threshold: {threshold}') ret, binary_image = cv2.threshold(image, threshold, 255, cv2.THRESH_BINARY) cv2.imwrite('./mode/{0:03d}.jpg'.format(i + 1), binary_image) ``` ### 微分ヒストグラム法微分ヒストグラム法は判別分析法の欠点を補った手法で、濃淡変化が小さい場合により有効である。プログラムを作成する際、言語はPythonを用いた。このプログラムでは移動平均法を用いて画素の8近傍の画素について輝度値の平均値を求め平均値と対象の画素の輝度値の差を調べる作業をすべての画素について行い，画素値ごとの微分値（絶対値）の和のヒストグラムにおいてヒストグラムの最も高い点をしきい値として二値化を行う．なお、ノイズの除去のため、前処理としてバイラテラルフィルタを用いてエッジを他フィルタに比べ比較的残しつつ画像の平滑化を行っている。 ```python= import os import numpy as np import glob import cv2 os.makedirs('./diff', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): diff_hist = [0]*256 img = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) # バイラテラルフィルタ img = cv2.bilateralFilter(img,9,75,75) histogram, bins = np.histogram(img.ravel(), 256, [0, 256], density=True) height, width = img.shape[:2] for x in range(1,width-2): for y in range(1,height-2): brightness = img[y, x] diff = abs(img[y-1, x-1]+img[y-1, x]+img[y-1, x+1]\ +img[y, x-1]+img[y, x+1]\ +img[y+1, x-1]+img[y+1, x]+img[y+1, x+1]\ -img[y, x]*8) diff_hist[brightness] += diff threshold = 0 diff_hist_val_max = 0 for x, diff_hist_val in enumerate(diff_hist): if diff_hist_val > diff_hist_val_max: diff_hist_val_max = diff_hist_val threshold = x print(f'frame:{old_name}, threshold: {threshold}') ret, binary_image = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY) cv2.imwrite('./diff/{0:03d}.jpg'.format(i + 1), binary_image) ``` ## 二値化画像の評価前章で二値化した画像に対して以下の表のように評価を行う。ここでの抽出率とは、画像内の標識の輪郭および標識の内容について、それぞれの輪郭が二値化画像から確認できるかを目視で評価したものである。 | 標識の輪郭の抽出率 | 評価点 | | ------------------ | ------ | | 0~24% | 0 | | 25~49% | 1 | | 50~74% | 2 | | 75~100% | 3 | | 標識の内容の抽出率 | 評価点 | | ------------------ | ------ | | 0~24% | 0 | | 25~49% | 1 | | 50~74% | 2 | | 75~100% | 3 | 評価を簡単に行うため、簡易的な画像表示プログラムを作成した。プログラムを作成する際、言語はPythonを用いた。なお、実行はJupyter Notebookを用いて行った。このプログラムでは原画像のグレースケール画像、Pタイル法を用いて二値化した画像、判別分析法を用いて二値化した画像、モード法を用いて二値化した画像、微分ヒストグラム法を用いて二値化した画像を各6秒間表示する。評価者は画像が表示されている間に上記の表に従って評価を記録する。 ```python= import os import time import glob import cv2 from matplotlib import pyplot as plt from IPython.display import clear_output skip_frame = 0 os.makedirs('./annotation', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, file_name in enumerate(files): if i < skip_frame: continue img_list = [] img_list.append(cv2.imread(file_name)) img_list.append(cv2.imread('./Ptile/{0:03d}.jpg'.format(i + 1))) img_list.append(cv2.imread('./Otsu/{0:03d}.jpg'.format(i + 1))) img_list.append(cv2.imread('./mode/{0:03d}.jpg'.format(i + 1))) img_list.append(cv2.imread('./diff/{0:03d}.jpg'.format(i + 1))) for img in img_list: print(file_name) plt.imshow(img) plt.show() time.sleep(6) clear_output() ``` 上記プログラムを用いて100枚のグレースケール画像に対応する二値化画像、つまり400枚の二値化画像について評価を行った。なお、評価者は20代男性1名である。評価の結果を以下のグラフに示す。 ![](https://i.imgur.com/n90wxGX.jpg) ![](https://i.imgur.com/kD9K7rH.jpg) また、それぞれの評価の平均値は以下の表のとおりである。 | Ptile_sign | Otsu_sign | mode_sign | diff_sign | | ----------- | ----------- | ----------- |:-----------:| | 0.88 / 3.00 | 2.06 / 3.00 | 0.89 / 3.00 | 0.80 / 3.00 | | Ptile_shape | Otsu_shape | mode_shape | diff_shape | | ----------- | ----------- | ----------- |:-----------:| | 0.86 / 3.00 | 2.57 / 3.00 | 1.11 / 3.00 | 1.12 / 3.00 | | Ptile | Otsu | mode | diff | | ----------- | ----------- | ----------- | ----------- | | 1.74 / 6.00 | 4.63 / 6.00 | 2.00 / 6.00 | 1.92 / 6.00 | 評価のうち、同一のグレースケール画像について、最もスコアが高い手法を数えた。なお、同一順位の手法があった際には同一順位の全ての手法を数えている。以下に結果を示す。 | Ptile | Otsu | mode | diff | | -------- | -------- | -------- | -------- | | 19 / 100 | 80 / 100 | 24 / 100 | 13 / 100 | グラフのプロットの為に以下のプログラムを作成した。プログラムを作成する際、言語はPythonを用いた。なお、実行はJupyter Notebookを用いて行った。 ```python= import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt df = pd.read_csv('./annotation/annotation.csv', index_col=0) plt.rcParams["figure.figsize"] = (12, 6) plt.subplots_adjust(wspace=0.25, hspace=0.25) plt.subplot(2, 4, 1) plt.plot(df['Ptile_sign'], 'b.') plt.ylabel("score (sign)", fontsize=16) plt.subplot(2, 4, 2) plt.plot(df['Otsu_sign'], 'm.') plt.subplot(2, 4, 3) plt.plot(df['mode_sign'], 'g.') plt.subplot(2, 4, 4) plt.plot(df['diff_sign'], 'r.') plt.subplot(2, 4, 5) plt.plot(df['Ptile_shape'], 'b.') plt.ylabel("score (shape)", fontsize=16) plt.xlabel("frame", fontsize=16) plt.subplot(2, 4, 6) plt.plot(df['Otsu_shape'], 'm.') plt.xlabel("frame", fontsize=16) plt.subplot(2, 4, 7) plt.plot(df['mode_shape'], 'g.') plt.xlabel("frame", fontsize=16) plt.subplot(2, 4, 8) plt.plot(df['diff_shape'], 'r.') plt.xlabel("frame", fontsize=16) plt.savefig('./annotation/sign_shape.jpg') ``` ```python= import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt df = pd.read_csv('./annotation/annotation.csv', index_col=0) plt.rcParams["figure.figsize"] = (12, 6) plt.subplots_adjust(wspace=0.25, hspace=0.25) plt.subplot(1, 4, 1) plt.plot(df['Ptile'], 'b.') plt.ylabel("score (sign + shape)", fontsize=16) plt.subplot(1, 4, 2) plt.plot(df['Otsu'], 'm.') plt.subplot(1, 4, 3) plt.plot(df['mode'], 'g.') plt.subplot(1, 4, 4) plt.plot(df['diff'], 'r.') plt.savefig('./annotation/all.jpg') ``` 結果より、Pタイル法、判別分析法、モード法、微分ヒストグラム法のうち、判別分析法のスコアの平均値が明らかにほかの手法より高いことが分かる。また、同一のグレースケール画像について、最もスコアが高い手法を数えた結果についても、100枚中80枚の画像において判別分析法のスコアが最も高かった。つまり、多くの画像に判別分析法を用いることで良い二値化結果を得られることが分かる。一方で100枚中20枚の画像については判別分析法よりもPタイル法、モード法、微分ヒストグラム法のいずれかの手法の方が良い結果が得られた。そこで次章では判別分析法よりもその他の手法を用いるべき画像を判定する方法について検討する。 ## 判別分析法以外の手法で二値化すべき特徴を持った画像の考察判別分析法よりもPタイル法、モード法、微分ヒストグラム法のいずれかの手法の方が良い結果が得られたグレースケール画像について、輝度ヒストグラムからなぜ判別分析法ではよい結果が得られなかったのかについて考察を行う。考察のため以下のプログラムを作成した。プログラムを作成する際、言語はPythonを用いた。このプログラムではグレースケール画像の輝度ヒストグラムを作成した後、Pタイル法、判別分析法、モード法、微分ヒストグラム法の四手法を用いて閾値を求めグラフ上にプロットする。これをすべてのグレースケール画像について行う。 ```python= import glob import os import numpy as np import cv2 import matplotlib as mpl import matplotlib.pyplot as plt os.makedirs('./graydata_hist', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): img = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) histogram, bins = np.histogram(img.ravel(), 256, [0, 256], density=True) cdf = np.cumsum(histogram) for x, p in enumerate(cdf): if p > 0.05: threshold_Ptile = x break threshold_otsu, _ = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU) threshold_mode = histogram[30:226].argmin() + 30 diff_hist = [0]*256 height, width = img.shape[:2] img = cv2.bilateralFilter(img,9,75,75) for x in range(1,width-2): for y in range(1,height-2): brightness = img[y, x] diff = abs(img[y-1, x-1]+img[y-1, x]+img[y-1, x+1]\ +img[y, x-1]+img[y, x+1]\ +img[y+1, x-1]+img[y+1, x]+img[y+1, x+1]\ -img[y, x]*8) diff_hist[brightness] += diff threshold_diff = 0 diff_hist_val_max = 0 for x, diff_hist_val in enumerate(diff_hist): if diff_hist_val > diff_hist_val_max: diff_hist_val_max = diff_hist_val threshold_diff = x plt.subplot(1, 1, 1) plt.plot(histogram, c='k', label="Brightness") plt.axvline(threshold_Ptile, 0, 1, c='b', label="Ptile") plt.axvline(threshold_otsu, 0, 1, c='m', label="Otsu") plt.axvline(threshold_mode, 0, 1, c='g', label="mode") plt.axvline(threshold_diff, 0, 1, c='r', label="diff") plt.legend() plt.savefig('./graydata_hist/{0:03d}.jpg'.format(i + 1)) plt.close('all') ``` 判別分析法よりもPタイル法、モード法、微分ヒストグラム法のいずれかの手法の方が良い結果が得られたグレースケール画像の輝度ヒストグラムのうち、象徴的なものを以下に示す。画像1 ![](https://i.imgur.com/p7z1lQe.jpg) ![093](https://i.imgur.com/xDMVO2Q.jpg) 画像1では大きなピークが1つ確認できる。判別分析法は大きなピークの中心に近い位置に閾値を決定してしまっているが、大きなピークと小さなピークの谷に閾値を決定したモード法の方がより良いスコアとなった。判別分析法では閾値の決定がうまくいかなかった理由として対象物の面積が小さく、輝度ヒストグラムに対象物を含む輝度値のピークがあまり表れていないことが原因であると考えられる。他のピークと比較して明らかに大きなピークがあるときはクラス間分散をうまくとることができず判別分析法の閾値決定方法ではうまく閾値が決定できないと考えられる。以上より、明らかに他より大きなピークがある場合には判別分析法ではうまく二値化を行うことができないと考えられる。次章ではこれをプログラムにより自動的に判定する方法について考察する。 ## 輝度ヒストグラムのピークサイズによる二値化方法の判定判別分析法ではうまく二値化できない可能性がある画像の判別のために、前章で挙げた明らかに他より大きなピークがあるときをプログラムで判定する必要がある。そこで、数値計算ライブラリであるscipyを用いて輝度ヒストグラムから極小値、極大値を抽出し、ピークの検出を行う。scipyの極小値、極大値探索関数は、引数orderを用いることで大きな極小値、極大値にしか反応しないように設定することができる。引数orderは現在のインデックスをnとすると、n-orderからn+orderのインデックスによって表される範囲の値変化を対象として極小値、極大値の判定を行う。 scipyによって発見した極小値、極大値からピークの幅を計算し、判別分析法ではうまく二値化できない可能性がある画像を判定する。明らかに他より大きなピークがあるときの判定には輝度ヒストグラムのインデックスが0から255までであることを利用して、インデックス幅の5割以上を占めるピーク、すなわちピーク幅128以上のピークが存在するとき警告を表示するようにした。 ```python= import glob import os import numpy as np import math import cv2 import matplotlib.pyplot as plt from scipy import signal os.makedirs('./hist_ids', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): img = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) histogram, bins = np.histogram(img.ravel(), 256, [0, 256], density=True) maxids = signal.argrelmax(histogram, order=10) minids = signal.argrelmin(histogram, order=10) plt.subplot(1, 1, 1) plt.plot(histogram, c='k', label="Brightness") for j in range(len(maxids[0])): plt.scatter(maxids[0][j], histogram[maxids[0][j]], c='r') for j in range(len(minids[0])): plt.scatter(minids[0][j], histogram[minids[0][j]], c='b') plt.savefig('./hist_ids/{0:03d}.jpg'.format(i + 1)) plt.close('all') # ピーク幅の検出、広すぎる幅を判定(128以上) if minids[0][0] > 128: print(old_name, "There is too wide peak!") for j in range(1, len(minids[0])): if minids[0][j-1] - minids[0][j] > 128: print(old_name, "There is too wide peak!") ``` [scipyでピーク値の抽出](https://qiita.com/wrblue_mica34/items/e174a71570abb710dcfb) >[scipy.signal.argrelmax](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.argrelmax.html) >[scipy.signal.argrelmin](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.argrelmin.html) このプログラムを前章で挙げた画像に対して実行したところ以下の結果を得た。設定した条件通りに処理が行われていることが確認できる。画像3 ![093](https://i.imgur.com/PkOAjRc.jpg) ``` ./graydata\093.jpg There is too wide peak! ``` ## 標識を撮影した画像に対する最適な二値化手法の提案前章で作成したプログラムを用いてインデックス幅の5割以上を占めるピークが存在しない場合には判別分析法を、インデックス幅の5割以上を占めるピークが存在する場合にはモード法または微分ヒストグラム法を提案ソフトウェアを作成しそのソフトウェアの評価を行う。プログラムを作成する際、言語はPythonを用いた。 ```python= import glob import os import numpy as np import math import cv2 import matplotlib.pyplot as plt from scipy import signal os.makedirs('./hist_ids', exist_ok=True) files = glob.glob("./graydata/*.jpg") for i, old_name in enumerate(files): img = cv2.imread(old_name, cv2.IMREAD_UNCHANGED) histogram, bins = np.histogram(img.ravel(), 256, [0, 256], density=True) maxids = signal.argrelmax(histogram, order=10) minids = signal.argrelmin(histogram, order=10) isWidePeak = False # ピーク幅の検出、広すぎる幅を判定(128以上) if minids[0][0] > 128: print(old_name, "この画像にはモード法を使うと良いでしょう。") isWidePeak = True for j in range(1, len(minids[0])): if minids[0][j-1] - minids[0][j] > 128: print(old_name, "この画像にはモード法もしくは微分ヒストグラム法を使うと良いでしょう。") isWidePeak = True if(not isWidePeak): print(old_name, "この画像には判別分析法を使うと良いでしょう。") ``` このプログラムを用いて、入力されたグレースケール画像に対して最適な二値化手法の提案を行い、その結果をアノテーションの結果と比較したところ100枚中88枚の画像に対して最適な二値化手法の提案を行うことが出来た。すべての入力画像に対して判別分析法を提案する場合では100枚中80枚に対して最適な二値化手法を提案していることになるので、8%の精度向上を行うことが出来たといえる。なお、インデックス幅の5割以上を占めるピークが存在するとしてモード法または微分ヒストグラム法を提案した画像は100枚中9枚であった。この9枚のうち8枚が適切な提案であったことがわかる。