# 210622 執行速度測試結果
[TOC]
## 測試目的
+ 於 **`batch_analysis`** 中,開啟或關閉 `url_detect`, `account_detect`, `remove_duplicate` 的情況下對執行時間影響。
### Code
+ 主要觀看 `model.batch_analysis()` 的耗時
```python=
times = 0
msg_file_list = ['2019_Oct_Data']
for msg_file in msg_file_list:
with open(msg_file + '.csv', 'r', encoding="utf-8") as f:
csv_list = preprocess_from_csv_to_list(f)
i = 0
start = 0
window = 1000
size = len(csv_list)
while start < size or start < 3000:
start = i * window
end = (i + 1) * window
print(datetime.now().strftime("%Y/%m/%d %H:%M:%S") + ', ' + msg_file + ' ' + f'{i:04d}')
# 每次丟1000筆進去,模擬連續小批次分析訊息
csv_df = model.batch_analysis(csv_list, pinyin_mode=True, url_detect=True, account_detect=True)
# csv_df = model.batch_analysis(csv_list, pinyin_mode=True, url_detect=False, account_detect=False)
save_path = 'data/' + msg_file + f'{i:04d}' + '_result.csv'
csv_df.to_csv(save_path, index=False)
i += 1
times += 1
```
將 **`batch_analysis`** 分割成以下步驟:
```
[1_pinyin_translate]
check pinyin available
translate to zh-cn and urldecode and convert special num
[2_account_phone]
account or phone number detect
[3_url_detect]
url detect
[4_language_rulebase]
language classify
rule base detect
[5_segment_duplicate]
create segment words list
remove duplicate words
[6_classZHEN]
classification zh and en words
[7_predictZH][8_predictEN]
predict words label
[9_result]
result list
```
## 測試機器
+ **CPU:** i7-6700
+ **RAM:** 16G
+ **GPU:** Null
## Result
### Log 格式
```
start_time, end_time, second, description
```
### Case 1. 維持原狀 (Default: 關閉 `url_detect`, 啟用 `account_detect`, 執行 `remove_duplicate`)
+ **`csv_df = model.batch_analysis(csv_list, pinyin_mode=True)`**
+ `remove_duplicate`
每次會丟 1000 筆進 batch_analysis,模擬連續小批次分析訊息,以下列出數個批次的結果
```
2021/06/22 20:40:18, 2019_Oct_Data 0000
2021/06/22 20:40:18, 2021/06/22 20:40:19, 0.224617, 1_pinyin_translate
2021/06/22 20:40:19, 2021/06/22 20:40:19, 0.040005, 2_account_phone
2021/06/22 20:40:19, 2021/06/22 20:40:19, 0.000497, 3_url_detect
2021/06/22 20:40:19, 2021/06/22 20:40:28, 9.479768, 4_language_rulebase
2021/06/22 20:41:02, 2021/06/22 20:41:02, 0.338362, 5_segment_duplicate
2021/06/22 20:41:02, 2021/06/22 20:41:02, 0.008508, 6_classZHEN
2021/06/22 20:41:09, 2021/06/22 20:43:41, 152.221834, 7_predictZH
2021/06/22 20:43:48, 2021/06/22 20:45:25, 97.14799, 8_predictEN
2021/06/22 20:45:25, 2021/06/22 20:45:25, 0.124075, 9_result
2021/06/22 20:45:25, 2019_Oct_Data 0001
2021/06/22 20:45:25, 2021/06/22 20:45:25, 0.21634, 1_pinyin_translate
2021/06/22 20:45:25, 2021/06/22 20:45:25, 0.018332, 2_account_phone
2021/06/22 20:45:25, 2021/06/22 20:45:25, 0.000658, 3_url_detect
2021/06/22 20:45:25, 2021/06/22 20:45:33, 7.879221, 4_language_rulebase
2021/06/22 20:46:07, 2021/06/22 20:46:07, 0.342503, 5_segment_duplicate
2021/06/22 20:46:07, 2021/06/22 20:46:07, 0.00773, 6_classZHEN
2021/06/22 20:46:15, 2021/06/22 20:48:43, 148.295625, 7_predictZH
2021/06/22 20:48:50, 2021/06/22 20:50:24, 94.240869, 8_predictEN
2021/06/22 20:50:24, 2021/06/22 20:50:24, 0.134321, 9_result
2021/06/22 20:50:24, 2019_Oct_Data 0002
2021/06/22 20:50:24, 2021/06/22 20:50:24, 0.216787, 1_pinyin_translate
2021/06/22 20:50:24, 2021/06/22 20:50:24, 0.017886, 2_account_phone
2021/06/22 20:50:24, 2021/06/22 20:50:24, 0.00064, 3_url_detect
2021/06/22 20:50:24, 2021/06/22 20:50:33, 8.484351, 4_language_rulebase
2021/06/22 20:51:07, 2021/06/22 20:51:07, 0.336325, 5_segment_duplicate
2021/06/22 20:51:07, 2021/06/22 20:51:07, 0.007933, 6_classZHEN
2021/06/22 20:51:16, 2021/06/22 20:53:46, 149.921058, 7_predictZH
2021/06/22 20:53:53, 2021/06/22 20:55:28, 95.08858, 8_predictEN
2021/06/22 20:55:28, 2021/06/22 20:55:28, 0.134846, 9_result
2021/06/22 20:55:28, 2019_Oct_Data 0003
2021/06/22 20:55:28, 2021/06/22 20:55:28, 0.216355, 1_pinyin_translate
2021/06/22 20:55:28, 2021/06/22 20:55:28, 0.018108, 2_account_phone
2021/06/22 20:55:28, 2021/06/22 20:55:28, 0.000569, 3_url_detect
2021/06/22 20:55:28, 2021/06/22 20:55:37, 8.500501, 4_language_rulebase
2021/06/22 20:56:10, 2021/06/22 20:56:11, 0.336066, 5_segment_duplicate
2021/06/22 20:56:11, 2021/06/22 20:56:11, 0.007613, 6_classZHEN
2021/06/22 20:56:19, 2021/06/22 20:58:48, 148.918653, 7_predictZH
2021/06/22 20:58:55, 2021/06/22 21:00:31, 96.162245, 8_predictEN
2021/06/22 21:00:31, 2021/06/22 21:00:31, 0.135948, 9_result
Total second: 1230
```
### Case 2. 啟用 `url_detect`, 關閉 `account_detect`, 不執行 `remove_duplicate`
+ `model.batch_analysis(csv_list, pinyin_mode=True, url_detect='urlextract', account_detect=False)`
+ 將 `remove_duplicate` 部分註解
每次會丟 1000 筆進 batch_analysis,模擬連續小批次分析訊息,以下列出數個批次的結果。
==由各批次的 `7_predictZH` 和 `8_predictEN` 可看出,Case 2 所花時間較多一些。==
```
2021/06/22 21:08:40, 2019_Oct_Data 0000
2021/06/22 21:08:40, 2021/06/22 21:08:40, 0.226093, 1_pinyin_translate
2021/06/22 21:08:40, 2021/06/22 21:08:40, 0.000605, 2_account_phone
2021/06/22 21:08:40, 2021/06/22 21:08:40, 0.037508, 3_url_detect
2021/06/22 21:08:40, 2021/06/22 21:08:50, 9.685325, 4_language_rulebase
2021/06/22 21:09:24, 2021/06/22 21:09:25, 0.43166, 5_segment_duplicate
2021/06/22 21:09:25, 2021/06/22 21:09:25, 0.14668, 6_classZHEN
2021/06/22 21:09:31, 2021/06/22 21:12:14, 162.493483, 7_predictZH
2021/06/22 21:12:20, 2021/06/22 21:14:05, 104.640442, 8_predictEN
2021/06/22 21:14:05, 2021/06/22 21:14:05, 0.134122, 9_result
2021/06/22 21:14:05, 2019_Oct_Data 0001
2021/06/22 21:14:05, 2021/06/22 21:14:05, 0.216142, 1_pinyin_translate
2021/06/22 21:14:05, 2021/06/22 21:14:05, 0.000631, 2_account_phone
2021/06/22 21:14:05, 2021/06/22 21:14:05, 0.016253, 3_url_detect
2021/06/22 21:14:05, 2021/06/22 21:14:14, 8.188204, 4_language_rulebase
2021/06/22 21:14:47, 2021/06/22 21:14:47, 0.426519, 5_segment_duplicate
2021/06/22 21:14:47, 2021/06/22 21:14:48, 0.147275, 6_classZHEN
2021/06/22 21:14:57, 2021/06/22 21:17:33, 156.788268, 7_predictZH
2021/06/22 21:17:41, 2021/06/22 21:19:22, 100.853545, 8_predictEN
2021/06/22 21:19:22, 2021/06/22 21:19:22, 0.143096, 9_result
2021/06/22 21:19:22, 2019_Oct_Data 0002
2021/06/22 21:19:22, 2021/06/22 21:19:22, 0.216124, 1_pinyin_translate
2021/06/22 21:19:22, 2021/06/22 21:19:22, 0.000636, 2_account_phone
2021/06/22 21:19:22, 2021/06/22 21:19:22, 0.015805, 3_url_detect
2021/06/22 21:19:22, 2021/06/22 21:19:30, 7.798961, 4_language_rulebase
2021/06/22 21:20:03, 2021/06/22 21:20:04, 0.416748, 5_segment_duplicate
2021/06/22 21:20:04, 2021/06/22 21:20:04, 0.144886, 6_classZHEN
2021/06/22 21:20:12, 2021/06/22 21:22:50, 158.546694, 7_predictZH
2021/06/22 21:22:58, 2021/06/22 21:24:39, 101.023627, 8_predictEN
2021/06/22 21:24:39, 2021/06/22 21:24:39, 0.147923, 9_result
2021/06/22 21:24:39, 2019_Oct_Data 0003
2021/06/22 21:24:39, 2021/06/22 21:24:39, 0.215923, 1_pinyin_translate
2021/06/22 21:24:39, 2021/06/22 21:24:39, 0.000541, 2_account_phone
2021/06/22 21:24:39, 2021/06/22 21:24:39, 0.015432, 3_url_detect
2021/06/22 21:24:39, 2021/06/22 21:24:48, 8.265825, 4_language_rulebase
2021/06/22 21:25:21, 2021/06/22 21:25:21, 0.418908, 5_segment_duplicate
2021/06/22 21:25:21, 2021/06/22 21:25:22, 0.142114, 6_classZHEN
2021/06/22 21:25:29, 2021/06/22 21:28:07, 157.430169, 7_predictZH
2021/06/22 21:28:14, 2021/06/22 21:29:55, 101.167407, 8_predictEN
2021/06/22 21:29:55, 2021/06/22 21:29:55, 0.14544, 9_result
Total second: 1292
```