第十四周週誌

###### tags: `工作週誌` # 第十四周週誌 C109154330 洗柏詞 # 摘要: 將錯誤字元更正的檔案進行訓練並對結果進行分析 # 工作內容: 列出分析結果，是否符合預期 ## 主要需要工具 visual studio code ## 匯入模組 ```python! import numpy as np import pandas as pd import random ... ``` ## 主程式使用 cc family 回傳數值紀錄 ```python! def cc(document, sentence_limit, word_limit, model, word_map, x, y, ccline): ...... for c, cc in enumerate(ccline): lintest2.append(cc) for idx, value in enumerate(rounded_preds.data.tolist()):) if rev_label_encoding[idx][int(value)] == "1": lintest.append(f'{icd_names[idx]}: {code_name_maps.get(icd_names[idx], "NA")}') d1 = [x for x in lintest if x in lintest2] d2 = [y for y in (lintest + lintest2) if y not in d1] ...... y = len(lintest + lintest2) return x, y #繪製折線圖的數據。x為答案數量;y為推薦數量 ``` ```python! def cc2(document, sentence_limit, word_limit, model, word_map,x ,y,ccline): #與cc()一樣，但是回傳的是 ccc = len(d1) #用於文字標示時的 return ccc ``` ### 遇到問題: 1. 檔案有標示錯誤的地方。如圖:![](https://i.imgur.com/lBCV26j.png) ### 解決方法: 1. 寫判斷式，讓程式找出錯誤的資料位置 ![](https://i.imgur.com/tonl7Yx.png) * 錯誤資料顯示,檔案編號 ![](https://i.imgur.com/nNZOFpj.png) * cc2應用數值比較(相同=黃色; 少於=紅色) ![](https://i.imgur.com/V5FSk58.png) ## 模擬結果 515~519 * 推薦系統舊版模型預測數量與新版模型預測數量比較折線圖 ![](https://i.imgur.com/J5d8IVF.png) * cc2應用數值比較(相同=黃色; 少於=紅色) ![](https://i.imgur.com/zgkk20h.png) # 結論: 預測結果有一些不進理想的部分，嘗試執行並找出不合理的地方。