###### tags: `工作週誌`
# 第十五周週誌
C109154330 洗柏詞
# 摘要:
將錯誤字元更正的檔案進行訓練並對結果進行分析
# 工作內容:
列出分析結果,是否符合預期
## 主要需要工具
visual studio code
## 匯入模組
```python!
import numpy as np
import pandas as pd
import random
...
```
## 主程式
使用 cc family 回傳數值紀錄
```python!
import collections
label_value_map = {}
for i in range(0, len(names)):
if i not in label_value_map.keys():
label_value_map[i] = {}
for id, v in enumerate(kmu_train_dataset[names[i]]):
if v not in label_value_map[i]:
label_value_map[i][v] = 0
label_value_map[i][v]+=1
```
```python!
label_encoding = {}
rev_label_encoding= {}
for t in label_value_map:
label_encoding[t] = {}
rev_label_encoding[t] = {}
for i, v in enumerate(sorted(label_value_map[t].keys())):
label_encoding[t][str(v)] = i
rev_label_encoding[t][i] = str(v)
label_encoding[t]['<unk>'] = i+1
rev_label_encoding[t][i+1] = '<unk>'
```
### 遇到問題:
1. 檔案有缺少的地方。
### 解決方法:
1. 理解檔案如何處理的
## 模擬結果
label_value_map =
{0: {0: 98971, 1: 2},
1: {0: 98971, 1: 2},
2: {0: 98972, 1: 1},
...
999: {0: 98972, 1: 1},
...}
rev_label_encoding =
{0: {0: '0', 1: '1', 2: '<unk>'},
1: {0: '0', 1: '1', 2: '<unk>'},
2: {0: '0', 1: '1', 2: '<unk>'},
...
999: {0: '0', 1: '1', 2: '<unk>'},
...}
# 結論:
在測試集,有些病例是沒有'1'的。
