# 210517 記憶體耗能測試結果
[TOC]
## 測試流程
觀看時間都花在哪裡,找到耗時較多的部分細部觀察
```
[1_pinyin_translate]
check pinyin available
translate to zh-cn and urldecode and convert special num
[2_account_phone]
account or phone number detect
[3_segment_duplicate]
create segment words list
remove duplicate words
[4_classZHEN]
classification zh and en words
[5_predictZH][6_predictEN]
predict words label
[1_embedding_padding_concat]
[1_1_embedding]
embedding
[1_2_padding]
padding
[1_3_concat]
concat
[2_try_predict]
try_predict
[3_predicted_label_corresponding]
predicted label corresponding by input words
[4_pinyin_mode]
pinyin_mode
[7_result]
result list
```
### Code
+ 主要觀看 `model.batch_analysis(csv_list, pinyin_mode=True)` 耗時
```python=
times = 0
msg_file_list = ['2019_Oct_Data']
for msg_file in msg_file_list:
with open(msg_file + '.csv', 'r', encoding="utf-8") as f:
csv_list = preprocess_from_csv_to_list(f)
i = 0
start = 0
window = 1000
size = len(csv_list)
while start < size or start < 1000:
start = i * window
end = (i + 1) * window
print(datetime.now().strftime("%Y/%m/%d %H:%M:%S") + ', ' + msg_file + ' ' + f'{i:04d}')
csv_df = model.batch_analysis(csv_list, pinyin_mode=True)
save_path = 'data/' + msg_file + f'{i:04d}' + '_result.csv'
csv_df.to_csv(save_path, index=False)
i += 1
times += 1
```
## 測試機器
+ **CPU:** i7-6700
+ **RAM:** 16G
+ **GPU:** Null
+ **OS:** Ubuntu
## 測試結果
### Log 格式
```
start_time, end_time, second, description
```
### 1. `model.batch_analysis(csv_list, pinyin_mode=True)`
```
2021/5/17 17:03:38, 2021/05/17 17:03:39, 0.507643, 1_pinyin_translate
2021/5/17 17:03:39, 2021/05/17 17:03:39, 0.016954, 2_account_phone
2021/5/17 17:03:39, 2021/05/17 17:03:39, 0.605382, 3_segment_duplicate
2021/5/17 17:03:39, 2021/05/17 17:03:39, 0.009971, 4_classZHEN
2021/5/17 17:03:39, 2021/05/17 17:06:38, 178.515002, 5_predictZH
2021/5/17 17:06:38, 2021/05/17 17:08:05, 87.027617, 6_predictEN
2021/5/17 17:08:05, 2021/05/17 17:08:05, 0.115689, 7_result
```
+ 以上是某批次 (1000 筆) 丟進 `model.batch_analysis` 的結果,主要都是花時間在 `predictZH`、`predictEN`。
+ 接下來針對 `predictZH` `predictEN` 所使用的 `self.predict_words_label(batch_segment_words, pinyin_mode=pinyin_mode)` 做進一步測試。
### 2. `self.predict_words_label(batch_segment_words, pinyin_mode=pinyin_mode)`
#### ZH
```
, ,169.47, 1_embedding_padding_concat
2021/5/17 17:06:36, 2021/05/17 17:06:38, 1.472061, 2_try_predict
2021/5/17 17:06:38, 2021/05/17 17:06:38, 0.001995, 3_predicted_label_corresponding
2021/5/17 17:06:38, 2021/05/17 17:06:38, 0.001995, 4_pinyin_mode
```
#### EN
```
, , 79.07, 1_embedding_padding_concat
2021/5/17 17:08:04, 2021/05/17 17:08:05, 0.706111, 2_try_predict
2021/5/17 17:08:05, 2021/05/17 17:08:05, 0.000998, 3_predicted_label_corresponding
2021/5/17 17:08:05, 2021/05/17 17:08:05, 0.000998, 4_pinyin_mode
```
### 3. `embedding` `padding` `concat`
+ 時間因為字詞長度不一樣,所以在 embedding、concat 所耗費時間皆不同。
+ 以下為數個字詞各自 embedding、padding、concat 的結果。
#### ZH
```
130.57, 1_1_embedding (0.03953 s/詞)
0.4425, 1_2_padding (0.0001 s/詞)
38.45, 1_3_concat (0.011 s/詞)
```
#### EN
```
67.87, 1_1_embedding (0.03841 s/詞)
0.2322, 1_2_padding (0.0001 s/詞)
10.95, 1_3_concat (0.0062 s/詞)
```