# o3-concept-for-live-transcripts-with-timecode
Below is a **battle-tested pattern** we use when we need “smart chunks” (≥ 60 s & ≤ 120 s, cut on silence) that keep their original SMPTE time-code so Whisper’s word-level stamps line up with the camera master.
---
## 0 ️⃣ High-level flow
```
HyperDeck → growing MOV
▼ (fs.watch / inotifywait)
Node daemon ─╴spawns ffmpeg
├─┬─ pulls 1–2 min WAVs cut at silence
│ │
│ └→ stores {path, tcStart}
└──▶ OpenAI Whisper
↳ word-level JSON ⤏ add tcStart ⤏ DB / SRT / captions
```
---
## 1 ️⃣ Cutting *variable-length* chunks with FFmpeg
FFmpeg’s segment muxer + `silencedetect` is a two-pass trick:
```bash
# pass 1 – emit silence events
ffmpeg -i cam1.mov -af silencedetect=noise=-35dB:d=0.35 -f null - \
2> silence.log
```
Parse `silence_start / silence_end` lines; whenever the
distance from the current *chunkStart* exceeds 60 s and
a silence ≥ 350 ms shows up **or** hard-cap hits 120 s,
add that ts to a `cutPoints[]`.
```bash
# pass 2 – one ffmpeg does all the slicing
cuts="0 ${cutPoints[*]}"
ffmpeg -i cam1.mov -f segment \
-segment_times "$cuts" \
-map 0:a -c pcm_s16le wav_out/%05d.wav
```
*Why two passes?* The segment muxer can’t listen to silence in
real-time; feeding it explicit cut times is rock-solid and keeps us
inside 1–2 minutes. (Stack Overflow discussion on the same idea
with `silencedetect` events ([stackoverflow.com][1]).)
---
### One-liner for quick tests
```bash
ffmpeg -i cam1.mov \
-af "silenceremove=start_periods=1:start_threshold=-35dB:start_duration=0.3,\
silenceremove=stop_periods=-1:stop_threshold=-35dB:stop_duration=0.3" \
-f segment -segment_time 120 -segment_min_duration 60 \
-c pcm_s16le wav_out/%05d.wav
```
`silenceremove` trims the *inside* silence so the segmenter will cut
only when speech has paused; if no pause occurs, the 120-second ceiling
forces a split. (`silenceremove` options doc ([stackoverflow.com][1]).)
---
## 2 ️⃣ Node.js daemon skeleton
```ts
import fs from "fs";
import { spawn } from "node:child_process";
import OpenAI from "openai";
import SMPTE from "smpte-timecode"; // tiny helper lib
const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY });
const TC_FPS = 30; // or 29.97 DF, etc.
function secondsToTc(sec: number, startTc: string) {
const base = new SMPTE(startTc, TC_FPS);
base.add(sec); // mutates ➜ new TC
return base.toString();
}
// 1. watch output dir for freshly-closed WAVs
fs.watch("wav_out", (_, file) => processChunk(`wav_out/${file}`));
async function processChunk(path: string) {
/* -------- keep absolute start tc in a sidecar JSON ---------- */
const match = /chunk_(\d{5})\.wav/.exec(path);
const index = Number(match![1]);
const tcStart = secondsToTc(indexOffset[index], hyperdeckStartTc);
/* -------- Whisper call with word-level stamps --------------- */
const res = await openai.audio.transcriptions.create({
file: fs.createReadStream(path),
model: "whisper-1",
response_format: "verbose_json",
timestamp_granularities: ["word"], // 👈 word stamps
}); // Medium guide :contentReference[oaicite:2]{index=2}
/* -------- re-base word timestamps --------------------------- */
res.segments.forEach(seg =>
seg.words.forEach(w => {
w.start += indexOffset[index];
w.end += indexOffset[index];
w.tc = secondsToTc(w.start, hyperdeckStartTc); // handy
})
);
await saveToDB(res); // or emit WebSocket → overlay captions
}
```
`indexOffset[]` is the cumulative second-offset **you** keep when you
emit each chunk; `hyperdeckStartTc` comes from `ffprobe -show_entries
format_tags=timecode` on the original MOV.
*Tip:* a complete open-source Python reference that does the same
offset math (chunks + Whisper + merge) is in Matija Žiberna’s tutorial
([buildwithmatija.com][2]).
---
## 3 ️⃣ Keeping, converting and exporting time-code
| Step | Method |
| ----------------------------------- | ------------------------------------------------------------------------------------ |
| **Get deck start TC** | `ffprobe -v quiet -show_entries format_tags=timecode -of default=noprint_wrappers=1` |
| **Store chunk start offset** | push `secondsFromStart` into `indexOffset[]` when each file closes |
| **Convert Whisper seconds → SMPTE** | tiny helper shown above (`smpte-timecode`, `tc-string`, etc.) |
| **Export captions** | build `WebVTT` or `SRT` with the re-based word stamps |
---
## 4 ️⃣ Why this beats `record spill`
| Metric | Record Spill | FFmpeg chunker |
| -------------- | ------------------------ | ------------------------ |
| HyperDeck wear | pauses deck every minute | continuous recording |
| File chaos | thousands of .mov | tidy WAV dir |
| Latency | depends on poller | \~ 1 s after chunk close |
| TC accuracy | perfect | perfect (offset math) |
If you later *also* want picture replays, just link the final JSON back
to Resolve or your HTML overlay—the time-codes will match frame-perfect
to the original ISO.
---
### Next steps
* Drop-in `fluent-ffmpeg` wrapper if you prefer pure JS over spawning.
* Add silence **on-the-fly** detection via `arecord | ffmpeg -i - …`
to avoid reading from disk at all.
* Wrap the whole thing in a PM2 service and expose a minimal REST
“/chunks” endpoint so other studio apps can subscribe.
Let me know which piece you’d like fleshed out (e.g., WebVTT exporter,
Companion macro to start/stop the pipeline, etc.) and I’ll dig deeper!
[1]: https://stackoverflow.com/questions/25697596/using-ffmpeg-with-silencedetect-to-remove-audio-silence "using FFMPEG with silencedetect to remove audio silence - Stack Overflow"
[2]: https://www.buildwithmatija.com/blog/building-a-long-audio-transcription-tool-with-openai-s-whisper-api "Building a Long Audio Transcription Tool with OpenAI’s Whisper API | Build with Matija"