###### tags: `工作週誌` # 第十三周週誌 C109154330 洗柏詞 # 摘要: 將錯誤字元更正的檔案進行訓練並對結果進行分析 # 工作內容: 使用visual studio code進行檢測。 ## 主要需要工具 visual studio code ## 匯入模組 ```python! import numpy as np import pandas as pd import random ... ``` ## 主程式 ```python! target_line = 100 text = None tks = None with io.open(TEST, 'r', encoding="utf-8") as file: for i, line in enumerate(tqdm(file)): target_line -= 1 #把2000/test_han.tsv每一行讀進來,往下讀到 if target_line == 0: #直到減至"0" tks = line.split('\t') #第target_line那行,用\t分割,tsk = <class 'list'> text = tks[0] #tsk的第一個 print(f'Found target id at {i}') print(text) print('**'*50) for i in range(1, len(tks)): if rev_label_encoding[i-1][int(tks[i])] == '1': print(f'{icd_names[i-1]}: {code_name_maps.get(icd_names[i-1], "NA")}') break print('=='*50) visualize_attention(*classify(text, SENTENCE_LIMIT, WORD_LIMIT, model, word_map), target_line) ``` ### 遇到問題: 1. 遇到不懂的新函數 2. VS code在print(rev_label_encoding)的時候會壞掉 ### 解決方法: 1. [getattr() 函数](https://www.runoob.com/python/python-func-getattr.html) 1. [enumerate() 函数](https://www.runoob.com/python/python-func-enumerate.html) 1. [tqdm() 函数](https://clay-atlas.com/blog/2019/11/11/python-chinese-tutorial-tqdm-progress-and-ourself/) | getattr(object, name[, default]) | enumerate(sequence, [start=0]) | tqdm() | | -------- | -------- | -------- | | object -> 目標 | sequence -> 目標 | 設定一個範圍 name -> 字串,目標內容 | start -> 標示起始值 default -> 默認返回值 | | [參數細項](https://github.com/tqdm/tqdm#parameters) | | ![](https://i.imgur.com/2Ltnxll.gif)|<div style="width: 225pt">![](https://i.imgur.com/tTdkZKD.png)</div>| | -------- | -------- | ## 模擬結果 | 修正過的han_glove6_loss | 未修正過的han_glove6_loss | | --------------------------------------------------------------------- | ---------------------------------------------------------------------- | | <div style="width: 350pt">![](https://i.imgur.com/EcuH6Ra.jpg) </div> | <div style="width: 350pt">![](https://i.imgur.com/3iO13Rm.jpg) </div> | |![](https://i.imgur.com/lV0KINX.png) | ![](https://i.imgur.com/ZIgjfsJ.png) | # 結論: 選擇的重點不同,但是結果似乎可以再修正。