# CAPTCHA 驗證碼辨識 [TOC] ## CNN model ![](https://hackmd.io/_uploads/ByWdVWmSn.png) <!-- ## CNN LSTM ### RNN Model (Recurrent Neural Network) 一般的RNN 本質 ![](https://hackmd.io/_uploads/Hyqk2WQrn.png) --> <!-- ### LSTM (Long Short-Term Memory) ![](https://hackmd.io/_uploads/r1Owab7rn.png) LSTM -> 為RNN的變種 -> 要去理解RNN 在幹嘛 LSTM 會加入 - 記憶/遺忘 Path - 篩選 Path - 忽視 Path 细胞状态"(cell state)的记忆单元 Message control -> input gate : message 進入 cell -> forget gate : 移除 不重要的info 可以根據時間來選擇遺忘 -> output gate : Extract 重要info 進入下個 hidden layer ## 記憶/forget ![](https://hackmd.io/_uploads/SyGSxf7rh.png) ## Selection path 這裡的 selection -> 會決定 經過 memory/forget 的Resulte 是否保留 ![](https://hackmd.io/_uploads/H1_tGfXr2.png) ## Ignore PATH Ignore -> 忽略初步 predicted 的 Result Avoid 影響之後的Result ![](https://hackmd.io/_uploads/rkMxEGXH3.png) project source Code ```python # RNNs x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x) x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x) ``` [layers.Bidirectional()](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional) -> RNN 的function 包裝器 Dropout -> 0,25 -> 防 over fitting (會在 hide layer 上使用) ### CTC operation 序列标注任务的损失函数和解码算法 -> loss function ? 输入输出序列对齐不明确的任务 (Output 向量長度跟 Label 長度不一樣) ## Extra squashing function y value -> -1 ~ 1 ![](https://hackmd.io/_uploads/BJISiW7Hn.png) ![](https://hackmd.io/_uploads/S1fb6b7Bn.png) logistic function y value -> 0 ~ 1 ## Code Analysis ### Data pre-proccessing 1. 在`encode_single_sample`函式中,對圖像進行了讀取、解碼、轉換為灰度圖像、裁剪和調整大小等處理。 2. 在`create_train_and_validation_datasets`函式中,使用`encode_single_sample`函式對圖像進行編碼,並將編碼後的圖像和標籤存儲在X和y中。 --> # 實作 ## Import library ``` import os import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib.image as mpimg import seaborn as sns from sklearn.model_selection import train_test_split import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers imgFolder = '/content/drive/MyDrive/DataSet/captcha/samples/samples/' ``` ## 資料可視化 ```python img_2g7nm = mpimg.imread(imgFolder+ '2g7nm.png') #Read an image from a file into an array. img_34pcn = mpimg.imread(imgFolder + '34pcn.png') img_bny23 = mpimg.imread(imgFolder + 'bny23.png') img_c4mcm = mpimg.imread(imgFolder + 'c4mcm.png') img_3c7de = mpimg.imread(imgFolder + '3c7de.jpg') img_nxf2c = mpimg.imread(imgFolder + 'nxf2c.jpg') img_pcmcc = mpimg.imread(imgFolder + 'pcmcc.jpg') img_yge7c = mpimg.imread(imgFolder + 'yge7c.jpg') # print(img_2g7nm) pixel Tenser imgDic ={'2g7nm.png':img_2g7nm, '34pcn.png':img_34pcn, 'bny23.png':img_bny23, 'c4mcm.png':img_c4mcm, '3c7de.jpg':img_3c7de, 'nxf2c.jpg':img_nxf2c, 'pcmcc.jpg':img_pcmcc, 'yge7c.jpg':img_yge7c} figure = plt.figure(figsize=(20,5)) #Create a new figure, or activate an existing figure. position= 1 for filename,img in imgDic.items(): figure.add_subplot(2,4,position) position = position +1 plt.imshow(img) plt.title('FileName = '+ filename + 'shape = ' + str(img.shape)) plt.show() ``` ![](https://hackmd.io/_uploads/B1Swx3ND2.png) --- ## 分析字符分布 ```python= dataFrame = pd.DataFrame(columns=[ 'fileName', 'extension', 'label','c1','c2','c3','c4','c5']) #print (dataFrame) i=0 for _, _, files in os.walk(imgFolder): #_, _, (ignore function return value) ,os.walk()遍歷給定目錄的檔案和目錄 #meowmeowmeowme for f in files: dataFrame.loc[i,'fileName']= f dataFrame.loc[i,'extension']=f.split('.')[1] dataFrame.loc[i,'lable']=f.split('.')[0] dataFrame.loc[i,'labelsize']=len(f.split('.')[0]) dataFrame.loc[i,'c1']=f.split('.')[0][0] dataFrame.loc[i,'c2']=f.split('.')[0][1] dataFrame.loc[i,'c3']=f.split('.')[0][2] dataFrame.loc[i,'c4']=f.split('.')[0][3] dataFrame.loc[i,'c5']=f.split('.')[0][4] i=i+1 # count chars number and 畫圖 charsData = pd.DataFrame( dataFrame['c1'].value_counts()+ dataFrame['c2'].value_counts()+ dataFrame['c3'].value_counts()+ dataFrame['c4'].value_counts()+ dataFrame['c5'].value_counts()).reset_index() # defined column Name charsData.columns = ['MeowChars','count'] sns.barplot(data=charsData, x='MeowChars',y='count') plt.show() ``` ![](https://hackmd.io/_uploads/BJ18ln4D3.png) ## 製作Data Set (training/testing) ```python #char -> number (for neural net) charToNumberDIC = {'2':0,'3':1,'4':2,'5':3,'6':4,'7':5,'8':6,'b':7,'c':8,'d':9,'e':10,'f':11,'g':12,'m':13,'n':14,'p':15,'w':16,'x':17,'y':18} #encode def encodeSingleImg(imgPath, label, crop): #crop : bool,如果為 True 對char corp img=tf.io.read_file(imgPath) #Read File (dataType = string) img =tf.io.decode_png(img,channels=1) #decode 轉灰度 channel = 1 img = tf.image.convert_image_dtype(img, tf.float32) # covert to images #corp for CNN if (crop==True): img = tf.image.crop_to_bounding_box(img,offset_height=0, offset_width=25, target_height=50, target_width=125) #corp: 50 ~ 150 width 部分 img = tf.image.resize(img,size=[50,200],method='bilinear', preserve_aspect_ratio=False,antialias=False, name=None) # resize 50 ~200 img = tf.transpose(img, perm=[1, 0, 2]) #將lable converts to 5 個整數的array label = list(map(lambda x:charToNumberDIC[x],label)) #lambda: Make an iterator return img.numpy(), label def createTrainTestingDataSet(crop=False): #Loop on all the file #x shape -> (1040,50,200,1) #y shape -> (1040,5) x,y=[],[] for _, _, files in os .walk(imgFolder): for f in files: # ignore JPG images label = f.split('.')[0] extension = f.split('.')[1] # 處理PNG 部分 if extension=='png': img, label = encodeSingleImg(imgFolder+f, label,crop) x.append(img) y.append(label) x = np.array(x) y = np.array(y) # split image(x) label(y) 的 training & testing xTraining, xTesting,yTraing,yTesting = train_test_split(x.reshape(1040, 10000), y, test_size = 0.1, shuffle=True, random_state=42) xTraining,xTesting = xTraining.reshape(936,200,50,1), xTesting.reshape(104,200,50,1) return xTraining, xTesting, yTraing, yTesting ``` ## Display Dataset images ``` xTrain, xTest, yTrain, yTest = createTrainTestingDataSet(crop=True) #For Cnn (croop) xTrain_, xTest_, yTrain_, yTest_ = createTrainTestingDataSet(crop=False) #CNN TLSM fig=plt.figure(figsize=(20, 10)) fig.add_subplot(2, 4, 1) plt.imshow(xTrain[0], cmap='gray') #plt.imshow(xTrain[0].transpose((1,0,2)), cmap='gray') plt.title('Image from xTrain with label '+ str(yTrain[0])) plt.axis('off') fig.add_subplot(2, 4, 2) plt.imshow(xTrain[935], cmap='gray') #plt.imshow(xTrain[935].transpose((1,0,2)), cmap='gray') plt.title('Image from xTrain with label '+ str(yTrain[935])) plt.axis('off') fig.add_subplot(2, 4, 3) plt.imshow(xTest[0], cmap='gray') #plt.imshow(xTest[0].transpose((1,0,2)), cmap='gray') plt.title('Image from xTest with label '+ str(yTest[0])) plt.axis('off') fig.add_subplot(2, 4, 4) plt.imshow(xTest[103], cmap='gray') #plt.imshow(xTest[103].transpose((1,0,2)), cmap='gray') plt.title('Image from xTest with label '+ str(yTest[103])) plt.axis('off') fig.add_subplot(2, 4, 5) plt.imshow(xTrain_[0], cmap='gray') plt.title('Image from xTrain with label '+ str(yTrain_[0])) plt.axis('off') fig.add_subplot(2, 4, 6) plt.imshow(xTrain_[935], cmap='gray') plt.title('Image from xTrain with label '+ str(yTrain_[935])) plt.axis('off') fig.add_subplot(2, 4, 7) plt.imshow(xTest_[0], cmap='gray') plt.title('Image from xTest with label '+ str(yTest_[0])) plt.axis('off') fig.add_subplot(2, 4, 8) plt.imshow(xTest_[103], cmap='gray') plt.title('Image from xTest with label '+ str(yTest_[103])) plt.axis('off') plt.show() ``` ![](https://hackmd.io/_uploads/SkXlpeUwn.png) ## Matrix (accuracy) ```python def cmputingPerformanceMetric(resultValue, truthValue): #print(resultValue.shape[0]) #print(resultValue.shape[1]) if resultValue.shape == truthValue.shape: return np.sum(resultValue == truthValue)/(resultValue.shape[0]*resultValue.shape[1]) # 1/5 -> 0.2 分 (如果對一個char (1X5)) else: raise Exception("Error : array 沒對好喔!!") ``` --- ## CNN Model ```python def buildModel(): #input images inputImages = layers.Input(shape=(200,50,1),name ="image",dtype="float32") #conv layer1 x =layers.Conv2D(32,(3,3),activation='relu',kernel_initializer="he_normal",padding="same",name="Convl")(inputImages) x =layers.MaxPooling2D((2,2),name='pool1')(x) #conv layer2 x=layers.Conv2D(64,(3,3),activation='relu', kernel_initializer='he_normal',padding="same",name="Conv2")(x) x=layers.MaxPooling2D((2,2),name='pool2')(x) #output -> filter =64 (50,12,64) #reshape 以在 5 個time-steps 長中“split”volume x=layers.Reshape(target_shape=(5,7680), name="reshape")(x) #Fully Connected Layers x = layers.Dense(256,activation='relu',name='fc1')(x) x = layers.Dense(64,activation='relu', name='fc2')(x) #output layer (softmax) output = layers.Dense(19, activation="softmax",name='fc3')(x) #define the modle model = keras.models.Model(inputs=inputImages, outputs=output, name="meowheackerCNNmodel") #compile the model and return model.compile(optimizer=keras.optimizers.Adam(),loss="sparse_categorical_crossentropy",metrics="accuracy") return model #Establish model model = buildModel() model.summary() ``` ![](https://hackmd.io/_uploads/Sk-CPOHw3.png) --- ## model Trainning ```python xTrain, xTest, yTrain, yTest = createTrainTestingDataSet(crop=True) #CNN dataSet record = model.fit(x=xTrain,y=yTrain, validation_data=(xTest,yTest),epochs=30) ``` ### Loss Analysis ```python # shwo loss graph plt.plot(record.history['loss']) plt.plot(record.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss Value') plt.xlabel('Epoch') plt.legend(['Traing', 'Testing']) plt.show() ``` ![](https://hackmd.io/_uploads/BJYV7ZUv3.png) ### Accuracy ```python #show accuracy graph plt.plot(record.history['accuracy']) plt.plot(record.history['val_accuracy']) plt.title('CNN Model accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['train', 'validation']) ``` ![](https://hackmd.io/_uploads/HJ8um-LD2.png) ## Traing 好的Model 實際跑Testing Images ``` yResult = model.predict(xTest) #yResult shape (104,50,19) yResult = np.argmax(yResult, axis=2) charToNumberDIC = {'-1':'UKN','0':'2','1':'3','2':'4','3':'5','4':'6','5':'7','6':'8','7':'b','8':'c','9':'d','10':'e','11':'f','12':'g','13':'m','14':'n','15':'p','16':'w','17':'x','18':'y'} nrow =1 figure = plt.figure(figsize=(20,5)) for i in range(0,10): if i>4: nrow =2 figure.add_subplot(nrow, 5, i+1) plt.imshow(xTest[i].transpose((1,0,2)),cmap='gray') plt.title('Result : ' + str(list(map(lambda x:charToNumberDIC[str(x)], yResult[i])))) plt.axis('off') plt.show() ``` 他其實會有一些小錯誤 有些是位置問題 ![](https://hackmd.io/_uploads/B13yMWLDh.png) ## Accuracy (Performance) ```python print(f'Accuracy: {cmputingPerformanceMetric(yResult,yTest)}') ``` ![](https://hackmd.io/_uploads/H1ePNb8wn.png) --- <!-- ## CNN TLSM ```python #CNN +TLSM class CTCLayer(layers.Layer): def __init__(self, name=None): super().__init__(name=name) self.lossFunction = keras.backend.ctc_batch_cost def call(self, yTrue, yResult): # computing loss value on running TIme batch_len = tf.cast(tf.shape(yTrue)[0], dtype="int64") input_length = tf.cast(tf.shape(yResult)[1], dtype="int64") label_length = tf.cast(tf.shape(yTrue)[1], dtype="int64") input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64") label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64") loss = self.lossFunction(yTrue, yResult, input_length, label_length) self.add_loss(loss) return yResult def build_model(): #inputlayer and labels inputImages = layers.Input(shape=(200,50,1), name="image", dtype="float32") labels = layers.Input(name="label", shape=(None,), dtype="float32") # conv1 kernal size = 3x3 strides 2 x = layers.Conv2D(32,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv1")(inputImages) x = layers.MaxPooling2D((2, 2), name="pool1")(x) # conv2 x = layers.Conv2D(64,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv2")(x) x = layers.MaxPooling2D((2, 2), name="pool2")(x) # filter -> 64 #kernalsize -> 3X3 #MaxPooling -> 2X2 x = layers.Reshape(target_shape=(50, 768), name="reshape")(x) x = layers.Dense(64, activation="relu", name="dense1")(x) x = layers.Dropout(0.2)(x) # RNN LTSM x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x) #Birectional -> input會 正傳跟反傳 x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x) # Output layer x = layers.Dense(20, activation="softmax", name="dense2")(x) # 20 = 19 characters + UKN # use ctc layer 計算 每一層損失 output = CTCLayer(name="ctc_loss")(labels, x) # define the model model = keras.models.Model(inputs=[inputImages, labels], outputs=output, name="ocr_cnn_lstm_model") # compile the model model.compile(optimizer=keras.optimizers.Adam()) return model model = build_model() model.summary() ``` <!-- Input Image ``` # Inputs to the model input_img = layers.Input(shape=(200,50,1), name="image", dtype="float32") labels = layers.Input(name="label", shape=(None,), dtype="float32") ``` 第一層 CONV ``` # First conv block x = layers.Conv2D(32,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv1")(input_img) x = layers.MaxPooling2D((2, 2), name="pool1")(x) ``` filter -> 32 個 kernalsize -> 3X3 MaxPooling -> 2X2 activation function -> relu 第二層 CONV ``` # Second conv block x = layers.Conv2D(64,(3, 3),activation="relu",kernel_initializer="he_normal",padding="same",name="Conv2")(x) x = layers.MaxPooling2D((2, 2), name="pool2")(x) ``` filter -> 64 kernalsize -> 3X3 MaxPooling -> 2X2 activation function -> relu ``` x = layers.Reshape(target_shape=(50, 768), name="reshape")(x) x = layers.Dense(64, activation="relu", name="dense1")(x) x = layers.Dropout(0.2)(x) ``` Reshpae -> 3 Dimension -> 2 Diminsion 去做CNN Encoder Dense -> FC layer 這裡64 -> output size (50,64) Dropout -> 0.2 -> 更好的泛化 RNN TLSH ```python x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x) x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x) ``` Birectional -> input會 正傳跟反傳 return_sequences -> 每次的output 跟timestamp 長度一樣 -> Many-to-Many [Reference](https://zhuanlan.zhihu.com/p/85910281) Ouptput Layer ``` Output layer x = layers.Dense(20, activation="softmax", name="dense2")(x) # 20 = 19 characters + UKN ``` CTC layer -> 自動對齊Input 跟 Output -> 取解決TLSM sequence 問題 ``` # Add CTC layer for calculating CTC loss at each step output = CTCLayer(name="ctc_loss")(labels, x) ``` Define the model ``` # Define the model model = keras.models.Model(inputs=[input_img, labels], outputs=output, name="ocr_cnn_lstm_model") ``` Compile the model ``` # Compile the model and return model.compile(optimizer=keras.optimizers.Adam()) return model ``` ``` model = build_model() model.summary() ``` --> <!-- ## Train Model ```python xTrain_, xTest_, yTrain, yTest = createTrainTestingDataSet(crop=False) # Add early stopping early_stopping = keras.callbacks.EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True) # Train the model history = model.fit([xTrain_, yTrain], validation_data=[xTest_, yTest], epochs=100, callbacks=[early_stopping],) ``` Early Stop (優化Training) patience -> 即放棄運作前對惡化的驗證集誤差觀測的次數 當超過10 會重新跟新weight Advance: 在每個epoch 結束 比較 validation 避免 Overfiting 省下訓練時間 -> 保持效能 --> # Reference https://keras.io/examples/audio/ctc_asr/) https://ieeexplore.ieee.org/document/9580020 https://ieeexplore.ieee.org/document/8029670 https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=LYKLjI/record?r1=1&h1=0    https://r23456999.medium.com/%E4%BD%95%E8%AC%82-cross-e https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=LYKLjI/record?r1=3&h1=1 https://www.kaggle.com/datasets/fournierp/captcha-version-2-images     https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/ccd=LYKLjI/record?r1=4&h1=2 https://openai.com/chatgpthttps://machinelearningmastery.com/cnn-long-short-term-memory-networks/ https://keras.io/examples/vision/captcha_ocr/ https://www.tensorflow.org/guide/keras/rnn https://towardsdatascience.com/intuitively-understanding-connectionist-temporal-classification-3797e43a86c