# Final_version_with_pinyin_detct 速度 ## 20210322 + **OS:** Ubuntu + **CPU:** i7-6700k + **GPU:** Nvidia GTX 970 + **RAM:** 16G + **Time:** 23:58:53~00:07:45 ``` $ python3 batch_analysis.py Preprocessing pinyin label file... Pinyin preprocess: 0it [00:00, ?it/s]batch_analysis.py:144: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy to_pinyin['pinyin'][i] = str(pinyin(row['pinyin'], style=Style.TONE3, heteronym=False)) Pinyin preprocess: 42150it [00:11, 3773.32it/s] dataset amount after removing official messages: 2231899 Decoding Message: 9987it [00:00, 31647.10it/s] dataset need decode:1037/9987 Detecting account: 9987it [00:03, 2509.92it/s] dataset containing account:424/9987 Determining language: 9987it [00:21, 463.36it/s] device State: cuda:0 Loading jieba user dictionary... Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.445 seconds. Prefix dict has been built successfully. /home/d4bu/.local/lib/python3.8/site-packages/transformers/configuration_xlnet.py:205: FutureWarning: This config doesn't use attention memories, a core feature of XLNet. Consider setting `mem_len` to a non-zero value, for example `xlnet = XLNetLMHeadModel.from_pretrained('xlnet-base-cased'', mem_len=1024)`, for accurate training performance as well as an order of magnitude faster inference. Starting from version 3.5.0, the default parameter will be 1024, following the implementation in https://arxiv.org/abs/1906.08237 warnings.warn( Loading textcnn zh-model... Predicting zh-message label, detecting rule-based word, detecting pinyin: 4480it [06:48, 10.96it/s] Loading textcnn en-model... Predicting en-message label, detecting rule-based word: 1416it [01:08, 20.74it/s] dataset containing rule-based: 1169 / 9987 result save at: message_predict_result/test_2019_Nov_Data_210322_0007.csv (pinyin fix label) result save at: message_predict_result/test_2019_Nov_Data_fix_labels_210322_0007.csv ```