# CAPTCHA 驗證碼辨識
[TOC]
## CNN model

<!--
## CNN LSTM
### RNN Model (Recurrent Neural Network)
一般的RNN 本質

-->
<!--
### LSTM (Long Short-Term Memory)

LSTM -> 為RNN的變種 -> 要去理解RNN 在幹嘛
LSTM 會加入
- 記憶/遺忘 Path
- 篩選 Path
- 忽視 Path
细胞状态"(cell state)的记忆单元
Message control
-> input gate : message 進入 cell
-> forget gate : 移除 不重要的info
可以根據時間來選擇遺忘
-> output gate : Extract 重要info 進入下個 hidden layer
## 記憶/forget

## Selection path
這裡的 selection -> 會決定 經過 memory/forget 的Resulte 是否保留

## Ignore PATH
Ignore -> 忽略初步 predicted 的 Result
Avoid 影響之後的Result

project source Code
```python
# RNNs
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x)
```
[layers.Bidirectional()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional) -> RNN 的function 包裝器
Dropout -> 0,25 -> 防 over fitting (會在 hide layer 上使用)
### CTC operation
序列标注任务的损失函数和解码算法 -> loss function ?
输入输出序列对齐不明确的任务 (Output 向量長度跟 Label 長度不一樣)
## Extra
squashing function
y value -> -1 ~ 1


logistic function
y value -> 0 ~ 1
## Code Analysis
### Data pre-proccessing
1. 在`encode_single_sample`函式中,對圖像進行了讀取、解碼、轉換為灰度圖像、裁剪和調整大小等處理。
2. 在`create_train_and_validation_datasets`函式中,使用`encode_single_sample`函式對圖像進行編碼,並將編碼後的圖像和標籤存儲在X和y中。
-->
# 實作
## Import library
```
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
imgFolder = '/content/drive/MyDrive/DataSet/captcha/samples/samples/'
```
## 資料可視化
```python
img_2g7nm = mpimg.imread(imgFolder+ '2g7nm.png') #Read an image from a file into an array.
img_34pcn = mpimg.imread(imgFolder + '34pcn.png')
img_bny23 = mpimg.imread(imgFolder + 'bny23.png')
img_c4mcm = mpimg.imread(imgFolder + 'c4mcm.png')
img_3c7de = mpimg.imread(imgFolder + '3c7de.jpg')
img_nxf2c = mpimg.imread(imgFolder + 'nxf2c.jpg')
img_pcmcc = mpimg.imread(imgFolder + 'pcmcc.jpg')
img_yge7c = mpimg.imread(imgFolder + 'yge7c.jpg')
# print(img_2g7nm) pixel Tenser
imgDic ={'2g7nm.png':img_2g7nm, '34pcn.png':img_34pcn, 'bny23.png':img_bny23, 'c4mcm.png':img_c4mcm,
'3c7de.jpg':img_3c7de, 'nxf2c.jpg':img_nxf2c, 'pcmcc.jpg':img_pcmcc, 'yge7c.jpg':img_yge7c}
figure = plt.figure(figsize=(20,5)) #Create a new figure, or activate an existing figure.
position= 1
for filename,img in imgDic.items():
figure.add_subplot(2,4,position)
position = position +1
plt.imshow(img)
plt.title('FileName = '+ filename + 'shape = ' + str(img.shape))
plt.show()
```

---
## 分析字符分布
```python=
dataFrame = pd.DataFrame(columns=[
'fileName',
'extension',
'label','c1','c2','c3','c4','c5'])
#print (dataFrame)
i=0
for _, _, files in os.walk(imgFolder): #_, _, (ignore function return value) ,os.walk()遍歷給定目錄的檔案和目錄
#meowmeowmeowme
for f in files:
dataFrame.loc[i,'fileName']= f
dataFrame.loc[i,'extension']=f.split('.')[1]
dataFrame.loc[i,'lable']=f.split('.')[0]
dataFrame.loc[i,'labelsize']=len(f.split('.')[0])
dataFrame.loc[i,'c1']=f.split('.')[0][0]
dataFrame.loc[i,'c2']=f.split('.')[0][1]
dataFrame.loc[i,'c3']=f.split('.')[0][2]
dataFrame.loc[i,'c4']=f.split('.')[0][3]
dataFrame.loc[i,'c5']=f.split('.')[0][4]
i=i+1
# count chars number and 畫圖
charsData = pd.DataFrame(
dataFrame['c1'].value_counts()+
dataFrame['c2'].value_counts()+
dataFrame['c3'].value_counts()+
dataFrame['c4'].value_counts()+
dataFrame['c5'].value_counts()).reset_index()
# defined column Name
charsData.columns = ['MeowChars','count']
sns.barplot(data=charsData, x='MeowChars',y='count')
plt.show()
```

## 製作Data Set (training/testing)
```python
#char -> number (for neural net)
charToNumberDIC = {'2':0,'3':1,'4':2,'5':3,'6':4,'7':5,'8':6,'b':7,'c':8,'d':9,'e':10,'f':11,'g':12,'m':13,'n':14,'p':15,'w':16,'x':17,'y':18}
#encode
def encodeSingleImg(imgPath, label, crop): #crop : bool,如果為 True 對char corp
img=tf.io.read_file(imgPath) #Read File (dataType = string)
img =tf.io.decode_png(img,channels=1) #decode 轉灰度 channel = 1
img = tf.image.convert_image_dtype(img, tf.float32) # covert to images
#corp for CNN
if (crop==True):
img = tf.image.crop_to_bounding_box(img,offset_height=0, offset_width=25, target_height=50, target_width=125) #corp: 50 ~ 150 width 部分
img = tf.image.resize(img,size=[50,200],method='bilinear', preserve_aspect_ratio=False,antialias=False, name=None) # resize 50 ~200
img = tf.transpose(img, perm=[1, 0, 2])
#將lable converts to 5 個整數的array
label = list(map(lambda x:charToNumberDIC[x],label)) #lambda: Make an iterator
return img.numpy(), label
def createTrainTestingDataSet(crop=False):
#Loop on all the file
#x shape -> (1040,50,200,1)
#y shape -> (1040,5)
x,y=[],[]
for _, _, files in os .walk(imgFolder):
for f in files:
# ignore JPG images
label = f.split('.')[0]
extension = f.split('.')[1]
# 處理PNG 部分
if extension=='png':
img, label = encodeSingleImg(imgFolder+f, label,crop)
x.append(img)
y.append(label)
x = np.array(x)
y = np.array(y)
# split image(x) label(y) 的 training & testing
xTraining, xTesting,yTraing,yTesting = train_test_split(x.reshape(1040, 10000), y, test_size = 0.1, shuffle=True, random_state=42)
xTraining,xTesting = xTraining.reshape(936,200,50,1), xTesting.reshape(104,200,50,1)
return xTraining, xTesting, yTraing, yTesting
```
## Display Dataset images
```
xTrain, xTest, yTrain, yTest = createTrainTestingDataSet(crop=True) #For Cnn (croop)
xTrain_, xTest_, yTrain_, yTest_ = createTrainTestingDataSet(crop=False) #CNN TLSM
fig=plt.figure(figsize=(20, 10))
fig.add_subplot(2, 4, 1)
plt.imshow(xTrain[0], cmap='gray')
#plt.imshow(xTrain[0].transpose((1,0,2)), cmap='gray')
plt.title('Image from xTrain with label '+ str(yTrain[0]))
plt.axis('off')
fig.add_subplot(2, 4, 2)
plt.imshow(xTrain[935], cmap='gray')
#plt.imshow(xTrain[935].transpose((1,0,2)), cmap='gray')
plt.title('Image from xTrain with label '+ str(yTrain[935]))
plt.axis('off')
fig.add_subplot(2, 4, 3)
plt.imshow(xTest[0], cmap='gray')
#plt.imshow(xTest[0].transpose((1,0,2)), cmap='gray')
plt.title('Image from xTest with label '+ str(yTest[0]))
plt.axis('off')
fig.add_subplot(2, 4, 4)
plt.imshow(xTest[103], cmap='gray')
#plt.imshow(xTest[103].transpose((1,0,2)), cmap='gray')
plt.title('Image from xTest with label '+ str(yTest[103]))
plt.axis('off')
fig.add_subplot(2, 4, 5)
plt.imshow(xTrain_[0], cmap='gray')
plt.title('Image from xTrain with label '+ str(yTrain_[0]))
plt.axis('off')
fig.add_subplot(2, 4, 6)
plt.imshow(xTrain_[935], cmap='gray')
plt.title('Image from xTrain with label '+ str(yTrain_[935]))
plt.axis('off')
fig.add_subplot(2, 4, 7)
plt.imshow(xTest_[0], cmap='gray')
plt.title('Image from xTest with label '+ str(yTest_[0]))
plt.axis('off')
fig.add_subplot(2, 4, 8)
plt.imshow(xTest_[103], cmap='gray')
plt.title('Image from xTest with label '+ str(yTest_[103]))
plt.axis('off')
plt.show()
```

## Matrix (accuracy)
```python
def cmputingPerformanceMetric(resultValue, truthValue):
#print(resultValue.shape[0])
#print(resultValue.shape[1])
if resultValue.shape == truthValue.shape:
return np.sum(resultValue == truthValue)/(resultValue.shape[0]*resultValue.shape[1]) # 1/5 -> 0.2 分 (如果對一個char (1X5))
else:
raise Exception("Error : array 沒對好喔!!")
```
---
## CNN Model
```python
def buildModel():
#input images
inputImages = layers.Input(shape=(200,50,1),name ="image",dtype="float32")
#conv layer1
x =layers.Conv2D(32,(3,3),activation='relu',kernel_initializer="he_normal",padding="same",name="Convl")(inputImages)
x =layers.MaxPooling2D((2,2),name='pool1')(x)
#conv layer2
x=layers.Conv2D(64,(3,3),activation='relu', kernel_initializer='he_normal',padding="same",name="Conv2")(x)
x=layers.MaxPooling2D((2,2),name='pool2')(x)
#output -> filter =64 (50,12,64)
#reshape 以在 5 個time-steps 長中“split”volume
x=layers.Reshape(target_shape=(5,7680), name="reshape")(x)
#Fully Connected Layers
x = layers.Dense(256,activation='relu',name='fc1')(x)
x = layers.Dense(64,activation='relu', name='fc2')(x)
#output layer (softmax)
output = layers.Dense(19, activation="softmax",name='fc3')(x)
#define the modle
model = keras.models.Model(inputs=inputImages, outputs=output, name="meowheackerCNNmodel")
#compile the model and return
model.compile(optimizer=keras.optimizers.Adam(),loss="sparse_categorical_crossentropy",metrics="accuracy")
return model
#Establish model
model = buildModel()
model.summary()
```

---
## model Trainning
```python
xTrain, xTest, yTrain, yTest = createTrainTestingDataSet(crop=True) #CNN dataSet
record = model.fit(x=xTrain,y=yTrain, validation_data=(xTest,yTest),epochs=30)
```
### Loss Analysis
```python
# shwo loss graph
plt.plot(record.history['loss'])
plt.plot(record.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss Value')
plt.xlabel('Epoch')
plt.legend(['Traing', 'Testing'])
plt.show()
```

### Accuracy
```python
#show accuracy graph
plt.plot(record.history['accuracy'])
plt.plot(record.history['val_accuracy'])
plt.title('CNN Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'])
```

## Traing 好的Model 實際跑Testing Images
```
yResult = model.predict(xTest) #yResult shape (104,50,19)
yResult = np.argmax(yResult, axis=2)
charToNumberDIC = {'-1':'UKN','0':'2','1':'3','2':'4','3':'5','4':'6','5':'7','6':'8','7':'b','8':'c','9':'d','10':'e','11':'f','12':'g','13':'m','14':'n','15':'p','16':'w','17':'x','18':'y'}
nrow =1
figure = plt.figure(figsize=(20,5))
for i in range(0,10):
if i>4: nrow =2
figure.add_subplot(nrow, 5, i+1)
plt.imshow(xTest[i].transpose((1,0,2)),cmap='gray')
plt.title('Result : ' + str(list(map(lambda x:charToNumberDIC[str(x)], yResult[i]))))
plt.axis('off')
plt.show()
```
他其實會有一些小錯誤
有些是位置問題

## Accuracy (Performance)
```python
print(f'Accuracy: {cmputingPerformanceMetric(yResult,yTest)}')
```

---
<!--
## CNN TLSM
```python
#CNN +TLSM
class CTCLayer(layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.lossFunction = keras.backend.ctc_batch_cost
def call(self, yTrue, yResult):
# computing loss value on running TIme
batch_len = tf.cast(tf.shape(yTrue)[0], dtype="int64")
input_length = tf.cast(tf.shape(yResult)[1], dtype="int64")
label_length = tf.cast(tf.shape(yTrue)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.lossFunction(yTrue, yResult, input_length, label_length)
self.add_loss(loss)
return yResult
def build_model():
#inputlayer and labels
inputImages = layers.Input(shape=(200,50,1), name="image", dtype="float32")
labels = layers.Input(name="label", shape=(None,), dtype="float32")
# conv1 kernal size = 3x3 strides 2
x = layers.Conv2D(32,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv1")(inputImages)
x = layers.MaxPooling2D((2, 2), name="pool1")(x)
# conv2
x = layers.Conv2D(64,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv2")(x)
x = layers.MaxPooling2D((2, 2), name="pool2")(x)
# filter -> 64
#kernalsize -> 3X3
#MaxPooling -> 2X2
x = layers.Reshape(target_shape=(50, 768), name="reshape")(x)
x = layers.Dense(64, activation="relu", name="dense1")(x)
x = layers.Dropout(0.2)(x)
# RNN LTSM
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x) #Birectional -> input會 正傳跟反傳
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x)
# Output layer
x = layers.Dense(20, activation="softmax", name="dense2")(x) # 20 = 19 characters + UKN
# use ctc layer 計算 每一層損失
output = CTCLayer(name="ctc_loss")(labels, x)
# define the model
model = keras.models.Model(inputs=[inputImages, labels], outputs=output, name="ocr_cnn_lstm_model")
# compile the model
model.compile(optimizer=keras.optimizers.Adam())
return model
model = build_model()
model.summary()
```
<!--
Input Image
```
# Inputs to the model
input_img = layers.Input(shape=(200,50,1), name="image", dtype="float32")
labels = layers.Input(name="label", shape=(None,), dtype="float32")
```
第一層 CONV
```
# First conv block
x = layers.Conv2D(32,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv1")(input_img)
x = layers.MaxPooling2D((2, 2), name="pool1")(x)
```
filter -> 32 個
kernalsize -> 3X3
MaxPooling -> 2X2
activation function -> relu
第二層 CONV
```
# Second conv block
x = layers.Conv2D(64,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv2")(x)
x = layers.MaxPooling2D((2, 2), name="pool2")(x)
```
filter -> 64
kernalsize -> 3X3
MaxPooling -> 2X2
activation function -> relu
```
x = layers.Reshape(target_shape=(50, 768), name="reshape")(x)
x = layers.Dense(64, activation="relu", name="dense1")(x)
x = layers.Dropout(0.2)(x)
```
Reshpae -> 3 Dimension -> 2 Diminsion 去做CNN Encoder
Dense -> FC layer
這裡64 -> output size (50,64)
Dropout -> 0.2 -> 更好的泛化
RNN TLSH
```python
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x)
```
Birectional -> input會 正傳跟反傳
return_sequences -> 每次的output 跟timestamp 長度一樣 -> Many-to-Many
[Reference](https://zhuanlan.zhihu.com/p/85910281)
Ouptput Layer
```
Output layer
x = layers.Dense(20, activation="softmax", name="dense2")(x) # 20 = 19 characters + UKN
```
CTC layer
-> 自動對齊Input 跟 Output -> 取解決TLSM sequence 問題
```
# Add CTC layer for calculating CTC loss at each step
output = CTCLayer(name="ctc_loss")(labels, x)
```
Define the model
```
# Define the model
model = keras.models.Model(inputs=[input_img, labels], outputs=output, name="ocr_cnn_lstm_model")
```
Compile the model
```
# Compile the model and return
model.compile(optimizer=keras.optimizers.Adam())
return model
```
```
model = build_model()
model.summary()
``` -->
<!--
## Train Model
```python
xTrain_, xTest_, yTrain, yTest = createTrainTestingDataSet(crop=False)
# Add early stopping
early_stopping = keras.callbacks.EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)
# Train the model
history = model.fit([xTrain_, yTrain], validation_data=[xTest_, yTest], epochs=100, callbacks=[early_stopping],)
```
Early Stop (優化Training)
patience -> 即放棄運作前對惡化的驗證集誤差觀測的次數 當超過10 會重新跟新weight
Advance:
在每個epoch 結束 比較 validation 避免 Overfiting
省下訓練時間 -> 保持效能 -->
# Reference
https://keras.io/examples/audio/ctc_asr/)
https://ieeexplore.ieee.org/document/9580020 https://ieeexplore.ieee.org/document/8029670
https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=LYKLjI/record?r1=1&h1=0 https://r23456999.medium.com/%E4%BD%95%E8%AC%82-cross-e
https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=LYKLjI/record?r1=3&h1=1
https://www.kaggle.com/datasets/fournierp/captcha-version-2-images https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=LYKLjI/record?r1=4&h1=2
https://openai.com/chatgpthttps://machinelearningmastery.com/cnn-long-short-term-memory-networks/
https://keras.io/examples/vision/captcha_ocr/
https://www.tensorflow.org/guide/keras/rnnhttps://towardsdatascience.com/intuitively-understanding-connectionist-temporal-classification-3797e43a86c