Try   HackMD

210517 記憶體耗能測試結果

測試流程

觀看時間都花在哪裡,找到耗時較多的部分細部觀察

[1_pinyin_translate]
check pinyin available
translate to zh-cn and urldecode and convert special num

[2_account_phone]
account or phone number detect

[3_segment_duplicate]
create segment words list
remove duplicate words

[4_classZHEN]
classification zh and en words

[5_predictZH][6_predictEN]
predict words label

    [1_embedding_padding_concat]
        [1_1_embedding]
        embedding

        [1_2_padding]
        padding

        [1_3_concat]
        concat

    [2_try_predict]
    try_predict

    [3_predicted_label_corresponding]
    predicted label corresponding by input words

    [4_pinyin_mode]
    pinyin_mode
    
[7_result]
result list

Code

  • 主要觀看 model.batch_analysis(csv_list, pinyin_mode=True) 耗時
times = 0 msg_file_list = ['2019_Oct_Data'] for msg_file in msg_file_list: with open(msg_file + '.csv', 'r', encoding="utf-8") as f: csv_list = preprocess_from_csv_to_list(f) i = 0 start = 0 window = 1000 size = len(csv_list) while start < size or start < 1000: start = i * window end = (i + 1) * window print(datetime.now().strftime("%Y/%m/%d %H:%M:%S") + ', ' + msg_file + ' ' + f'{i:04d}') csv_df = model.batch_analysis(csv_list, pinyin_mode=True) save_path = 'data/' + msg_file + f'{i:04d}' + '_result.csv' csv_df.to_csv(save_path, index=False) i += 1 times += 1

測試機器

  • CPU: i7-6700
  • RAM: 16G
  • GPU: Null
  • OS: Ubuntu

測試結果

Log 格式

start_time, end_time, second, description

1. model.batch_analysis(csv_list, pinyin_mode=True)

2021/5/17 17:03:38, 2021/05/17 17:03:39, 0.507643, 1_pinyin_translate
2021/5/17 17:03:39, 2021/05/17 17:03:39, 0.016954, 2_account_phone
2021/5/17 17:03:39, 2021/05/17 17:03:39, 0.605382, 3_segment_duplicate
2021/5/17 17:03:39, 2021/05/17 17:03:39, 0.009971, 4_classZHEN
2021/5/17 17:03:39, 2021/05/17 17:06:38, 178.515002, 5_predictZH
2021/5/17 17:06:38, 2021/05/17 17:08:05, 87.027617, 6_predictEN
2021/5/17 17:08:05, 2021/05/17 17:08:05, 0.115689, 7_result
  • 以上是某批次 (1000 筆) 丟進 model.batch_analysis 的結果,主要都是花時間在 predictZHpredictEN
  • 接下來針對 predictZH predictEN 所使用的 self.predict_words_label(batch_segment_words, pinyin_mode=pinyin_mode) 做進一步測試。

2. self.predict_words_label(batch_segment_words, pinyin_mode=pinyin_mode)

ZH

, ,169.47, 1_embedding_padding_concat
2021/5/17 17:06:36, 2021/05/17 17:06:38, 1.472061, 2_try_predict
2021/5/17 17:06:38, 2021/05/17 17:06:38, 0.001995, 3_predicted_label_corresponding
2021/5/17 17:06:38, 2021/05/17 17:06:38, 0.001995, 4_pinyin_mode

EN

, , 79.07, 1_embedding_padding_concat
2021/5/17 17:08:04, 2021/05/17 17:08:05, 0.706111, 2_try_predict
2021/5/17 17:08:05, 2021/05/17 17:08:05, 0.000998, 3_predicted_label_corresponding
2021/5/17 17:08:05, 2021/05/17 17:08:05, 0.000998, 4_pinyin_mode

3. embedding padding concat

  • 時間因為字詞長度不一樣,所以在 embedding、concat 所耗費時間皆不同。
  • 以下為數個字詞各自 embedding、padding、concat 的結果。

ZH

130.57, 1_1_embedding (0.03953 s/詞)
0.4425, 1_2_padding (0.0001 s/詞)
38.45, 1_3_concat (0.011 s/詞)

EN

67.87, 1_1_embedding (0.03841 s/詞)
0.2322, 1_2_padding (0.0001 s/詞)
10.95, 1_3_concat (0.0062 s/詞)