Chapter9. 自媒體業者必看！使用 AI 自動生成高品質字幕

# Chapter9. 自媒體業者必看！使用 AI 自動生成高品質字幕 :+1: 完整程式碼在 https://github.com/iamalex33329/chatgpt-develop-guide-zhtw ## 其他章節 [Chapter1. OpenAI API 入門](https://hackmd.io/@U3f2IzHERbymAst2-lDdjA/S1cNMYi6T) [Chapter2. 使用 Python 呼叫 API](https://hackmd.io/@U3f2IzHERbymAst2-lDdjA/HyZBg5ia6) [Chapter3. API 參數解析與錯誤處理](https://hackmd.io/@U3f2IzHERbymAst2-lDdjA/BJWNtsh6p) [Chapter4. 打造自己的 ChatGPT](https://hackmd.io/@112356044/Hk81U96Tp) [Chapter5. 突破時空限制 - 整合搜尋功能](https://hackmd.io/@112356044/HkbVM-ApT) [Chapter6. 讓 AI 幫 AI - 自動串接流程](https://hackmd.io/@112356044/r1Ke-GR6T) [Chapter7. 網頁版聊天程式與文字生圖 Image API](https://hackmd.io/@112356044/Hyf-AvgAT) [Chapter8. 設計 LINE AI 聊天機器人](https://hackmd.io/@112356044/r1d6HsgAa) [Chapter10. 把 AI 帶到 Discord](https://hackmd.io/@112356044/Sy_L-B40T) [Chapter11. AI 客製化投資理財應用實戰](https://hackmd.io/@112356044/HkUE0rER6) [Chapter12. 用 LangChain 實作新書宣傳自動小編](https://hackmd.io/@112356044/SybvbdN0p) ## 目錄結構 [TOC] ## 使用 PyTube 套件輕鬆下載 YouTube 檔案這節會使用 PyTube 這個套件來取得 YouTube 影音檔案 > 著作權法第 65 條 > > 1. 著作之合理使用，不構成著作財產權之侵害。 > 2. 著作之利用是否合於第四十四條至第六十三條所定之合理範圍或其他合理使用之情形，應審酌一切情狀，尤應注意下列事項，以為判斷之基準：一、利用之目的及性質，包括係為商業目的或非營利教育目的。二、著作之性質。三、所利用之質量及其在整個著作所占之比例。四、利用結果對著作潛在市場與現在價值之影響。 > 3. 著作權人團體與利用人團體就著作之合理使用範圍達成協議者，得為前項判斷之參考。 > 4. 前項協議過程中，得諮詢著作權專責機關之意見。 ### 複製 Replit 專案：輕鬆下載 YouTube 檔案我這裡不使用 Replit 來下載，而是在本地端安裝 **pytube** 套件 ``` python= !pip install pytube from pytube import YouTube from pytube.exceptions import RegexMatchError, VideoUnavailable from os import path def get_best_stream(yt, is_audio=True): if is_audio: return yt.streams.get_audio_only(None) else: return yt.streams.filter(progressive=True, file_extension='mp4') \ .order_by('resolution') \ .desc() \ .first() def download_file(stream): filename = path.basename(stream.get_file_path()) print("Downloading file.....") stream.download(filename=filename) print("Download complete") return filename if __name__ == "__main__": url = input("Please enter the YouTube video URL: ") is_audio_str = input("Download audio only? (Y/N): ") is_audio = is_audio_str.lower() == "y" try: print("Getting video information.......") yt = YouTube(url) stream = get_best_stream(yt, is_audio) if stream: download_file(stream) except RegexMatchError: print("Invalid URL!") except VideoUnavailable: print("Video unavailable. This might require logging in to view.") ``` 1. `get_best_stream()`：這個函式接收一個 YouTube 物件和一個布林值 `is_audio`，預設為 `True`。如果 `is_audio` 為 `True`，則函式將嘗試取得影片的最佳音訊串流；如果為 `False`，則將取得影片的最佳影片串流。這裡使用 pytube 套件中的方法來選擇適合的串流。 2. `download_file`：這個函式接收一個串流物件，它負責下載這個串流所代表的影片或音訊。 ## 使用 Whisper「語音轉文本」模型轉出字幕檔手動上字幕是一件非常耗時耗力的工作，這章節會使用 [OpenAI 開發的 Whisper](https://openai.com/research/whisper) 模型來將音訊轉成字幕檔。目前支援數十種語言，能根據**語速**以及**斷句**，來進行文本分行。 ``` python= from pytube import YouTube from pytube.exceptions import RegexMatchError, VideoUnavailable from os import path import openai import apikey openai.api_key = apikey.OPENAI_API_KEY def find_best_stream(url, is_audio=True): try: print("Fetching video information.......") yt = YouTube(url) if is_audio: return yt.streams.get_audio_only(None) else: return yt.streams.filter(progressive=True, file_extension='mp4') \ .order_by('resolution') \ .desc() \ .first() except RegexMatchError: print("Invalid URL!") return None except VideoUnavailable: print("Video unavailable. This might require logging in to view.") return None def download_file(stream): filename = path.basename(stream.get_file_path()) print("Downloading file.....") stream.download(filename=filename) print("Download complete") return filename def transcribe_audio_to_subtitles(audio_filename): print("Transcribing audio to subtitles....") with open(audio_filename, "rb") as f: caption = openai.Audio.transcribe('whisper-1', f, response_format='srt', prompt='') print("Transcription complete") print("Saving subtitles to file") srt_filename = path.splitext(audio_filename)[0] + ".srt" with open(srt_filename, "w") as f: f.write(caption) print("File saved") return caption if __name__ == "__main__": source_type = input("Please enter the source of the audio (1 or 2)\n1. YouTube URL\n2. Local file\n：") if source_type == "1": url = input("Please enter the YouTube video URL: ") stream = find_best_stream(url, is_audio=True) if stream: audio_filename = download_file(stream) elif source_type == "2": audio_filename = input("Please enter the uploaded file name (including extension): ") else: print("Incorrect input. Please try again.") if audio_filename: transcribe_audio_to_subtitles(audio_filename) ``` 這裡結合前面的 pytube 下載影音檔案之後，利用 OpenAI API 的 Audio 來進行語音翻文本 `line 40 ~ 53` 是使用的範例 ``` python= openai.Audio.transcribe('whisper-1', f, response_format='srt', prompt='') ``` 1. `whisper-1`：模型名稱 2. `f`：音檔路徑 3. `response_format`：輸出格式，這裡可以選擇 `text`, `vtt`, `verbose_json` 等方式輸出 4. `prompt`：能讓我們輸入一些 prompt 來限制模型稱成文本的格式。例如模型經常識別出錯誤的文字或加入不該有的標點符號。但這項功能的效果不穩定！這邊使用兩部 YouTube 影片試驗，一部為全英文台詞，一部則為中英皆有的影片。 #### 1. [Can you solve the prisoner hat riddle? - Alex Gendler](https://www.youtube.com/watch?v=N5vJSNXPEwA) ![image](https://hackmd.io/_uploads/BJrAsEE0T.png) 其字幕效果為（可在 [Github 專案](https://github.com/iamalex33329/chatgpt-develop-guide-zhtw/blob/master/Chapter9/Can%20you%20solve%20the%20prisoner%20hat%20riddle%20-%20Alex%20Gendler.srt)上看到完整字幕檔）：`Can you solve the prisoner hat riddle - Alex Gendler.srt` ![Screenshot 2024-03-17 at 5.46.31 PM](https://hackmd.io/_uploads/S1GZ24E0p.png) #### 2. [別再說 very good! 來學學「厲害」的其他8個說法！](https://www.youtube.com/watch?v=oyaiZFGWcW8) ![image](https://hackmd.io/_uploads/H1bw3VE06.png) 其字幕效果為（可在 [Github 專案](https://github.com/iamalex33329/chatgpt-develop-guide-zhtw/blob/master/Chapter9/%E5%88%A5%E5%86%8D%E8%AA%AA%20very%20good!%20%E4%BE%86%E5%AD%B8%E5%AD%B8%E3%80%8C%E5%8E%B2%E5%AE%B3%E3%80%8D%E7%9A%84%E5%85%B6%E4%BB%968%E5%80%8B%E8%AA%AA%E6%B3%95%EF%BC%81.srt)上看到完整字幕檔）：`別再說 very good! 來學學「厲害」的其他8個說法！.srt` ![Screenshot 2024-03-17 at 5.49.10 PM](https://hackmd.io/_uploads/HkWshE40T.png) > 仔細檢查仍有一小部分的翻譯是錯誤的，但絕大部分都是正確並且精確斷句。 ## 影片不是中文的？讓 AI 變出中文字幕！範例一的影片其原文字幕是**英文**，若我們想轉成中文字幕該如何做到？我們可以再次串接 ChatGPT，讓他來把其他語言轉成中文！ ``` python= from pytube import YouTube from pytube.exceptions import RegexMatchError, VideoUnavailable from os import path import openai import apikey openai.api_key = apikey.OPENAI_API_KEY MAX_TOKEN = 1000 def find_best_stream(url, is_audio=True): try: print("Fetching video information.......") yt = YouTube(url) if is_audio: audio_best = yt.streams.get_audio_only(None) return audio_best else: video_best = yt.streams \ .filter(progressive=True, file_extension='mp4') \ .order_by('resolution') \ .desc() \ .first() return video_best except RegexMatchError: print("Invalid URL!") return None except VideoUnavailable: print("Video unavailable. This might require logging in to view.") return None def download_file(stream): file_basename = path.basename(stream.get_file_path())[:32] filename = file_basename + path.splitext(stream.get_file_path())[1] print("Downloading file.....") stream.download(filename=filename) print("Download complete") return filename def transcribe_audio_to_subtitles(audio_filename): print("Transcribing audio to subtitles....") with open(audio_filename, "rb") as f: caption = openai.Audio.transcribe('whisper-1', f, response_format='srt', prompt='') print("Transcription complete") print("Saving subtitles to file") srt_filename = path.splitext(audio_filename)[0] + ".srt" with open(srt_filename, "w") as f: f.write(caption) print("File saved") return caption def get_reply(messages): try: response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages) reply = response["choices"][0]["message"]["content"] except openai.OpenAIError as err: reply = f"Error occurred: {err.error.type}\n{err.error.message}" return reply def check_cht(caption_list): sample_text = "\n".join(caption_list[2:11:4]) print(sample_text) reply = get_reply([{ 'role': 'user', 'content': f''' Please determine if the following content is in Chinese? ``` {sample_text} ``` Reply 'Y' if it's Chinese, 'N' if it's not. Do not add any other text. ''' }]) print(reply) return reply == 'Y' def translate_to_cht(caption_list, audio_filename): srt_filename = path.splitext(audio_filename)[0] + "_cht.srt" print(srt_filename + '\n\n') hist = [] backtrace = 2 for i in range(2, len(caption_list), 4): while len(hist) > 2 * backtrace: hist.pop(0) hist.append({ 'role': 'user', 'content': f''' Please translate the following content into Traditional Chinese, do not add any explanation: {caption_list[i]} ''' }) reply = get_reply(hist) print(reply) while len(hist) > 2 * backtrace: hist.pop(0) hist.append({'role': 'assistant', 'content': reply}) caption_list[i] = reply with open(srt_filename, "w") as f: f.write("\n".join(caption_list)) if __name__ == "__main__": source_type = input("1. YouTube URL 2. Local file\nPlease enter the source of the audio (1 or 2): ") if source_type == "1": url = input("Please enter the YouTube video URL: ") stream = find_best_stream(url, is_audio=True) if stream: audio_filename = download_file(stream) elif source_type == "2": audio_filename = input("Please enter the uploaded file name (including extension): ") else: print("Incorrect input. Please try again.") if audio_filename: caption = transcribe_audio_to_subtitles(audio_filename) caption_list = caption.splitlines() if check_cht(caption_list): print('This video is in Chinese.') else: print('This video is not in Chinese. Translating the subtitles into Chinese for you.') translate_to_cht(caption_list, audio_filename) print('Subtitles have been translated into Chinese.') ``` > 這裡有一個問題，歌詞請 openai 進行翻譯會造成請求次數過多，進而造成伺服器錯誤。因此**盡量不要使用 API 來翻譯**， ``` Rate limit reached for gpt-3.5-turbo in organization org-arMXkgkeFtWUYjoI9CrvPUiB on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing. ```