親愛的參賽者您好：

為了讓各位參賽者能更順利進行繁體中文場景文字辨識競賽，本信件提供參賽指引如下，因賽程初階賽、進階賽及高階賽都有需要，因此請參考以下步驟。
若有任何疑問，歡迎至官網提問，謝謝！

指引一：Step by Step Colab 操作教學

本篇將一步一步的介紹如何在 colab 上訓練 DB model，請配合 DB on colab 範例會比較容易理解。

本篇的將用到這些連結內容:
- Using Dataset released on T-Brain 繁體中文場景文字辨識競賽－初階
- Using colab
- Using code from github MHLIAO/DB
  - DB 論文

Step1: 設定 colab

打開 colab 並新增筆記( .ipynb )
將雲端掛載到 colab 上( 兩種方式 )
1. 直接選紅框框中的圖示
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
2. 使用指令
  - 在 block 中輸入以下指令
```
from google.colab import drive
drive.mount('/content/drive')
```
  - 按選黑色play按鈕跑框框內的程式碼，點選橘框中的網址並登入 google 帳號後，按同意，你將會得到一串亂碼，再將亂碼複製貼上到紅色框框中，按下enter，完成掛載。
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
設定執行階段
- 預設是 GPU
- 可以點選左上角工具列中 [執行階段] >> [變更執行階段] >> [GPU or TPU] 來設定
查看所配到的顯卡資訊( GPU )
- 輸入以下指令
```
!nvidia-smi
```

補充

colab是在linux 系統上使用python語言，執行linux 指令時須在前面加 ! 或 %

參考網站:
Google Colab 實用奧步篇 ( 連結硬碟、繪圖中文顯示問題 )
用 Google Colab 協助開發日常

Step2: 從 github 中 clone DB code

cd 到 MyDrive 下，這樣才能在雲端硬碟中看到我們所下載的程式碼


%cd /content/drive/MyDrive/

下載 DB code from github MHLIAO/DB


!git clone https://github.com/MhLiao/DB.git

跑完此指令後，將會在您的雲端硬碟中看到一個名為DB的資料夾，裡面會存放 DB 原作者的 code

Step3: 安裝第三方的 library and pytorch

colab 已經裝了在深度學習中常用的 library 和 package，若使用自己的設備，則須安裝。

cd 到剛剛所下載的 DB 資料夾中


%cd /content/drive/MyDrive/DB

安裝程式所需要的 library

DB 資料夾中的 requirement.txt 會列出所有需要的 libray


!pip install -r requirement.txt

依照 cuda 版本安裝 pytorch

pytorch 是專門為深度學習所設計的 package


!pip install torch
!pip install torchvision

pytorch 官網安裝指令

檢查所安裝的 torch 跟 cuda 是否可執行


import torch
torch.cuda.is_available()

Step4: Build deformable convolution operator

找到 CUDA folder 的路徑，並設定環境變數 CUDA_HOME 為想使用的 cuda 版本


!ls /usr/local/ # 查看已有的 cuda 版本
!export CUDA_HOME="/usr/local/cuda-10.1"

如果你的 pytorch 版本 > 1.3，則需要將DB/assets/ops/dcn/src/deform_conv_cuda.cpp 和 DB/assets/ops/dcn/src/deform_pool_cuda.cpp 中的 "AT_CHECK" 改成 "TORCH_CHECK"
compile and build operator


%cd /content/drive/MyDrive/DB/assets/ops/dcn
!python setup.py build_ext --inplace

如果說需要重新compile and build，則須刪除舊的結果


!rm -r build

何謂Deformable convolution?

Step5: 準備資料集

Downloads 訓練集 T-Brain 繁體中文場景文字辨識競賽－初階
將Ground Truth 從 json 檔格式轉成 txt 格式以符合 code 載入 data 的格式，也可以自行撰寫程式碼來載入json
- txt格式為{coordinate},{label}:
  - coordinate : 四個座標點
  - ‵label‵ : 文字內容，label = ### if Don't care
```
Example
321,550,360,546,361,554,300,550,理髮店
373,545,416,527,417,549,364,552,專賣店
```
將訓練集與測試集放在雲端硬碟上，建議依照範例的型式擺放
- Example:
  - datasets/AICUP1/train_images
  - datasets/AICUP1/train_gts
  - datasets/AICUP1/train_list.txt

json 轉 txt 範例程式

複製到.py 檔後，下執行指令:


python json2txt.py --input_json_dir path/to/json_directory --output_txt_dir path/to/save/files --image_list path/to/an/image/list/file





























































import os
import json
import argparse
from PIL import Image

def read_jsonFile(input_json_dir,fname):
	info_list = []
	path = os.path.join(input_json_dir,fname)
	with open(path, 'r',encoding="utf-8") as f:
		json_i = json.load(f)
	shapes = json_i['shapes']
	for i in range(len(shapes)):
		temp = {}
		temp['group_id']=shapes[i]['group_id']
		temp['points']=shapes[i]['points']
		temp['label']=shapes[i]['label']
		info_list.append(temp)

	return info_list

def getBboxInfo(i_list):
	# 0:中文字串 1:中文單字 2:英數字串 3:中英數字串 4:中文單字字串 5:其他 255:don't care
	allBbox = []
	for info in i_list:
		x1,y1 = round(info['points'][0][0]),round(info['points'][0][1])
		x2,y2 = round(info['points'][1][0]),round(info['points'][1][1])
		x3,y3 = round(info['points'][2][0]),round(info['points'][2][1])
		x4,y4 = round(info['points'][3][0]),round(info['points'][3][1])
		allBbox.append((x1,y1,x2,y2,x3,y3,x4,y4,info['label'],info['group_id']))
		
	return allBbox

def writeToFile(output_path,file_name, result):
	path = os.path.join(output_path,file_name)
	with open(path, "w", encoding="utf-8") as writeFile:  
		for box in result:
			string = ",".join(str(p) for p in box)+"\n"
			writeFile.write(string)
		writeFile.close()
if __name__ == '__main__':
	parser = argparse.ArgumentParser()
	parser.add_argument('--input_json_dir',type=str,default="./datasets/AICUP1/train_jsons",help='path of GT json directory')
	parser.add_argument('--output_txt_dir',type=str,default="./datasets/AICUP1/train_gts",help='path of GT txt directory')
	parser.add_argument('--image_list',type=str,default="./datasets/AICUP1_train/train_list.txt",help='path of image list')
	arg = parser.parse_args()

	json_list = os.listdir(arg.input_json_dir)

	#create output directory
	if not os.path.exists(arg.output_txt_dir):
		os.mkdir(arg.output_txt_dir)
		
	for json_file in sorted(json_list):
		outputFileName = json_file.replace('json', 'jpg')+'.txt'
		info_list = read_jsonFile(arg.input_json_dir,json_file)
		bboxlist = getBboxInfo(info_list)
		writeToFile(arg.output_txt_dir,outputFileName, bboxlist)

		#create image list txt file
		with open(arg.image_list,'a',encoding="utf-8") as f:
			f.write(json_file.replace('json', 'jpg')+'\n')

Step6: 設定訓練參數

參考github code 在 DB/experiments/seg_detector/*.yaml 所設計的參數檔，可以參考icdar2015的yaml

主要分成兩個檔案，base_*.yaml 和 *_resnet_deform_thre.yaml，複製這兩個參數檔，並對資料路徑和訓練參數做更改，想辦法訓練出準確度高的model

base_*.yml for dataset setting

需將資料路經改成 [Step5: 準備資料集] 所設定的路徑，如下:

import:
- 'experiments/base.yaml'
package:
    - 'decoders.seg_detector_loss'
define:
  - name: train_data
    class: ImageDataset
    data_dir:
        - './datasets/AICUP_1/'
    data_list:
        - './datasets/AICUP_1/train_list.txt'

*_resnet_deform_thre.yaml for model and triaing setting
- 需import 到你所設定的 base_*.yaml,如下:
```
import:
    - 'experiments/seg_detector/base_*.yaml'
```
- 其他訓練的參數也在這個yaml檔中做更改

Step7: Start Training

run train.py 來訓練


!CUDA_VISIBLE_DEVICES=0 python train.py experiments/seg_detector/AICUP_resnet50_deform_thre.yaml 
--num_gpus 1

成功開始訓練會看到log,如下圖:
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

補充
train.py中有許多參數可以在 runtime 時再做設定，理論上會直接覆蓋yaml中的設定。

訓練中可能會遇到各式各樣的問題，請多多善用github DB issue，基本上會遇到的問題上面都有解答了。
也可以在本篇中留言，大家一起討論。

Step8: Evaluation and Demo

run eval .py 來測試
- resume 放你要測試的model


!CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/AICUP_resnet50_deform_thre.yaml --resume model/AICUP_WORKSHOP --box_thresh 0.5 --result_dir AICUP_result

成功驗證會看到如下圖，此分數非AICUP競賽所用的評分方式。

補充
eval.py中有許多參數可以在 runtime 時再做設定，理論上會直接覆蓋yaml中的設定。

run demo .py





%cd /content/drive/MyDrive/DB/
!CUDA_VISIBLE_DEVICES=0 python demo.py experiments/seg_detector/AICUP_resnet50_deform_thre.yaml \
--resume model/AICUP_WORKSHOP \
--image_path ./datasets/AICUP_1/test_images/img_55.jpg \
--box_thresh 0.5 --visualize

補充
demo.py中有許多參數可以在 runtime 時再做設定，理論上會直接覆蓋yaml中的設定。

show image
- {path/to/image}: 照片路徑


from IPython.display import Image, display
display(Image({path/to/image},width=640, height=640))

指引二：資料集生成與單字辨識範例

2021繁體中文場景文字辨識比賽–資料集生成與單字辨識範例：https://reurl.cc/dV9Dqz

指引三：巡迴課程影片

繁體中文場景文字辨識競賽巡迴課程影片：https://youtu.be/1PYIDtbkCeE

指引四：獎勵

除了教育部獎狀和獎金之外，名次前25%的參賽者，可領取AI CUP計畫辦公室獎狀，對於未來求職或是履歷都有加分效果！

預祝
各位參賽者比賽順利

AI CUP 計畫辦公室敬上

Hsinnnnn

2021/06/10 03:33:30

。

您好，我在訓練途中發生tensor size不同的error，請問我該怎麼做修正呢?謝謝! RuntimeError: stack expects each tensor to be equal size, but got [3, 1365, 1024] at entry 0 and [3, 1024, 1229] at entry 1 (Edited)