###### tags: `python` # 讀取pdf 翻譯成中文 再存成word ## 安裝相關套件 ```shell= pip install python-dotenv pip install pdfplumber pip install python-docx pip install google-cloud-translate ``` ## 申請憑證 1. https://console.cloud.google.com/ ![](https://i.imgur.com/FHNOe2l.png) ![](https://i.imgur.com/wEkHNG7.png) 2. 尋找要啟用的API服務 ![](https://i.imgur.com/kxecGse.png) ![Uploading file..._pvh7j814z]() ![](https://i.imgur.com/mr3JLtx.png) 3. 啟用該服務 點「啟用」 ![](https://i.imgur.com/8dwdJjf.png) 4. 申請憑證 ![](https://i.imgur.com/bmxBe5p.png) ![](https://i.imgur.com/8rPC70o.png) ![](https://i.imgur.com/f0eJmO5.png) ![](https://i.imgur.com/vWxopzH.png) ![](https://i.imgur.com/5qJtHeR.png) ![](https://i.imgur.com/kV7k1By.png) ![](https://i.imgur.com/52VYCjr.png) ![](https://i.imgur.com/hafcGPj.png) ![](https://i.imgur.com/FtN7Nic.png) ![](https://i.imgur.com/qyrEBw4.png) 5. 取得憑證 ![](https://i.imgur.com/1hKwQHz.png) ![](https://i.imgur.com/4Z6wrCc.png) ![](https://i.imgur.com/OPoEvkB.png) ![](https://i.imgur.com/JazUI5R.png) ![](https://i.imgur.com/DHQqqrs.png) 6. 將下載好的json檔搬移到專案目錄中 7. 將.env檔的內容指到這個檔案 ![](https://i.imgur.com/IhsryJM.png) 8. 即可透過以下程式使用翻譯功能 ```python= from dotenv import load_dotenv import pdfplumber import docx def get_pdf_content(filepath, page_number): pdf = pdfplumber.open(filepath) p = pdf.pages[page_number] text = p.extract_text() #讀文字 # print(text) return text def translate_text_with_model(target, text, model="nmt"): """Translates text into the target language. Make sure your project is allowlisted. Target must be an ISO 639-1 language code. See https://g.co/cloud/translate/v2/translate-reference#supported_languages """ from google.cloud import translate_v2 as translate translate_client = translate.Client() if isinstance(text, bytes): text = text.decode("utf-8") # Text can also be a sequence of strings, in which case this method # will return a sequence of results for each text. result = translate_client.translate(text, target_language=target, model=model) # print(u"Text: {}".format(result["input"])) # print(u"Translation: {}".format(result["translatedText"])) # print(u"Detected source language: {}".format(result["detectedSourceLanguage"])) return result["translatedText"] def write_docx(filepath, text): mydoc = docx.Document() first_para = mydoc.add_paragraph("第一章 aaaa\n") first_para.add_run(text) mydoc.save(filepath) if __name__ == '__main__': load_dotenv() source_filepath=r"/home/amos/文件/kivy.pdf" # 外文pdf檔路徑 target_filepath=r"./test.docx" # 翻譯好的word檔路徑 source = 'en' # 原始檔案語系 target = 'zh-TW' # 要翻譯為哪個語系 page_number = 14 # 取pdf檔第幾頁內容 text = get_pdf_content(source_filepath, page_number) target_text = translate_text_with_model(target, text) # print(target_text) write_docx(target_filepath, target_text) ```
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up