Generate podcast transcript using whisper.cpp

# Project - Generate podcast transcript using whisper.cpp ## What's whisper.cpp ```whisper.cpp``` is a C/C++ port of OpenAI’s Whisper speech-to-text model, optimized for speed and running locally on CPUs — even without a GPU. It allows you to transcribe audio files to text using Whisper models in a lightweight and portable way. ### ✅ Key Features - Runs offline – No internet or cloud API needed - No Python required – Great for embedding into C/C++ or other native apps - Multi-language support – Supports transcription in many languages - Real-time/streaming – Possible with microphone or streaming input - Small binary – Lightweight and portable ### 📦 Model Support - It uses Whisper models from OpenAI, such as: - tiny, base, small, medium, large ## How to generate transcript ### Install FFmpeg - Need to use FFmpeg to transfer audio file into .wav file - For mac: run ```brew install ffmpeg``` - Verify installation: ```ffmpeg -version``` - Convert audio into .wav format: ``` ffmpeg -y -i [input_file.mp3] -ar 16000 -ac 1 [output].wav ``` ### Install whisper.cpp - Github link: https://github.com/ggml-org/whisper.cpp - Follow the quick start - Clone the repo: ```git clone https://github.com/ggml-org/whisper.cpp.git``` - Navigate into the directory: ```cd whisper.cpp``` - Download the model in ggml format: ```sh ./models/download-ggml-model.sh medium``` - Build the project: ``` cmake -B build cmake --build build --config Release ``` - Run whisper-cli in default mode and output to transcript.txt file: ``` ./build/bin/whisper-cli -m ./models/ggml-medium.bin [path_of_target_audio].wav > transcript.txt ``` - Run whisper-cli in Chinese: add ```-l zh``` ``` ./build/bin/whisper-cli -m ./models/ggml-medium.bin [path_of_target_audio].wav -l zh > transcript.txt ``` - Flag usage: run ```./build/bin/whisper-cli -h``` ### Automate the process with shell script - Command line input: audio file path, language ```audio.mp3 zh``` ```audio.m4a en``` - Convert audio into .wav format - Transcribe the audio into .txt file ```bash #!/bin/bash # === USAGE CHECK === if [ $# -ne 1 ]; then echo "Usage: $0 <audio_filename>" echo "Example: ./transcribe.sh audio.mp3" exit 1 fi FILENAME="$1" # === CONFIGURATION === INPUT_DIR="${HOME}/Downloads" OUTPUT_DIR="${HOME}/Downloads" MODEL_PATH="./whisper.cpp/models/ggml-medium.bin" WHISPER_BIN="./whisper.cpp/build/bin/whisper-cli" # === DERIVED NAMES === INPUT_FILE="${INPUT_DIR}/${FILENAME}" BASENAME="${FILENAME%.*}" WAV_FILE="${OUTPUT_DIR}/${BASENAME}.wav" TXT_FILE="${OUTPUT_DIR}/${BASENAME}.txt" # === CHECK FILE EXISTS === if [ ! -f "$INPUT_FILE" ]; then echo "Error: File not found -> $INPUT_FILE" exit 1 fi # === MAKE OUTPUT DIRECTORY === mkdir -p "$OUTPUT_DIR" # === CONVERT TO WAV === echo ">> Converting $FILENAME to WAV..." ffmpeg -y -i "$INPUT_FILE" -ar 16000 -ac 1 "$WAV_FILE" # === TRANSCRIBE WITH WHISPER (Default in EN) === echo ">> Transcribing $BASENAME.wav..." "$WHISPER_BIN" -m "$MODEL_PATH" -otxt -nt -l en -of "$OUTPUT_DIR/$BASENAME" "$WAV_FILE" > "$TXT_FILE" echo ">> Done. Transcript saved to: $TXT_FILE" ```