字型編碼 - HackMD

###### tags: `python` # 字型編碼 ## 安裝 pip install chardet ## 範例 ```python= import chardet txtfile = 'big5_string.txt' #測試的檔案 # txtfile = 'utf8_string.txt' #測試的檔案 rawdata = open(txtfile, 'rb').read() # 開啟並讀取txtfile指定的文字檔 result = chardet.detect(rawdata) # 偵測文字編碼，檔案內的文字越多越準確!! charenc = result['encoding'] print(charenc) # 印出文字可能的編碼型態 inF=open(txtfile,'rb') # 重新讀取txtfile指定的文字檔 s=str(inF.read(), charenc) # 根據剛剛偵測出來的文字編碼來讀取文字 print(s) # 印出檔案的文字內容 inF.close() # 關閉檔案 ``` 文字範例檔:(可下載測試或自行產生文字檔) 1. [BIG5](https://drive.google.com/open?id=0B4bcDH-aFGlOc2hGU21xLTNUTVU) 2. [UTF8](https://drive.google.com/open?id=0B4bcDH-aFGlOclpPTnhFbDN5SXc) ## chardet內建測試程式當安裝好chardet之後也可以在命令列執行以下指令來測試文件編碼 > chardetect 檔案1 檔案2 ## 參考資料 * [使用 chardet 偵測字串的編碼](https://ephrain.net/python-%E4%BD%BF%E7%94%A8-chardet-%E5%81%B5%E6%B8%AC%E5%AD%97%E4%B8%B2%E7%9A%84%E7%B7%A8%E7%A2%BC/) * [Python 與中文處理(雖然是pytho2.x版，但觀念說明的很好)](http://web.ntnu.edu.tw/~samtseng/present/Python_and_Chinese.pdf) * [線上檔案編碼偵測](https://nlp.fi.muni.cz/projects/chared/)