###### tags: `工作週誌` # 第十五周週誌 C109154330 洗柏詞 # 摘要: 將錯誤字元更正的檔案進行訓練並對結果進行分析 # 工作內容: 列出分析結果,是否符合預期 ## 主要需要工具 visual studio code ## 匯入模組 ```python! import numpy as np import pandas as pd import random ... ``` ## 主程式 使用 cc family 回傳數值紀錄 ```python! import collections label_value_map = {} for i in range(0, len(names)): if i not in label_value_map.keys(): label_value_map[i] = {} for id, v in enumerate(kmu_train_dataset[names[i]]): if v not in label_value_map[i]: label_value_map[i][v] = 0 label_value_map[i][v]+=1 ``` ```python! label_encoding = {} rev_label_encoding= {} for t in label_value_map: label_encoding[t] = {} rev_label_encoding[t] = {} for i, v in enumerate(sorted(label_value_map[t].keys())): label_encoding[t][str(v)] = i rev_label_encoding[t][i] = str(v) label_encoding[t]['<unk>'] = i+1 rev_label_encoding[t][i+1] = '<unk>' ``` ### 遇到問題: 1. 檔案有缺少的地方。 ### 解決方法: 1. 理解檔案如何處理的 ## 模擬結果 label_value_map = {0: {0: 98971, 1: 2}, 1: {0: 98971, 1: 2}, 2: {0: 98972, 1: 1}, ... 999: {0: 98972, 1: 1}, ...} rev_label_encoding = {0: {0: '0', 1: '1', 2: '<unk>'}, 1: {0: '0', 1: '1', 2: '<unk>'}, 2: {0: '0', 1: '1', 2: '<unk>'}, ... 999: {0: '0', 1: '1', 2: '<unk>'}, ...} # 結論: 在測試集,有些病例是沒有'1'的。 ![](https://i.imgur.com/tN5aXCC.png =300x200)