# transcription-schema
# Airtable Schema for Transcription System
## Overview
This document describes the Airtable database schema for storing transcription sessions, video segments, transcription segments, and speaker references from the v5 speaker identification system.
## Database Structure
### Four Tables with Relationships
```
Sessions (1) � VideoSegments (many) � TranscriptionSegments (many)
Sessions (1) � SpeakerReferences (many)
```
---
## Table 1: Sessions
**Description:** One record per recording session
### Fields
| Field Name | Type | Description | Options |
|------------|------|-------------|---------|
| `SessionId` | Single line text | Unique session identifier (Primary) | Primary field |
| `StartTime` | Date | Session start timestamp | Include time |
| `EndTime` | Date | Session end timestamp | Include time |
| `Model` | Single line text | OpenAI model used | e.g., "gpt-4o-transcribe-diarize" |
| `Language` | Single line text | Language code | e.g., "en", "es" |
| `Prompt` | Long text | Custom prompt for transcription | |
| `AutoSplitMinutes` | Number | Minutes between auto-splits | Integer |
| `FilePrefix` | Single line text | Prefix for saved files | e.g., "recording" |
| `Status` | Single select | Session status | Options: Active, Completed, Error |
| `TotalSegments` | Number | Count of video segments | Rollup/Count from VideoSegments |
| `TotalSpeakers` | Number | Count of identified speakers | Integer |
| `Created` | Created time | Auto-generated creation time | |
| `VideoSegments` | Link to another record | Links to VideoSegments table | Allow linking to multiple records |
| `SpeakerReferences` | Link to another record | Links to SpeakerReferences table | Allow linking to multiple records |
---
## Table 2: VideoSegments
**Description:** One record per MediaRecorder segment (e.g., 10-minute chunks)
### Fields
| Field Name | Type | Description | Options |
|------------|------|-------------|---------|
| `SegmentId` | Single line text | Unique segment identifier (Primary) | Primary field, e.g., "seg_001" |
| `Session` | Link to another record | Links to Sessions table | Single link |
| `SegmentNumber` | Number | Sequential number within session | Integer |
| `StartTime` | Date | Segment start timestamp | Include time |
| `EndTime` | Date | Segment end timestamp | Include time |
| `DurationSeconds` | Number | Duration in seconds | Decimal |
| `AudioFilePath` | Single line text | Relative path to audio file | e.g., "_output/session123/recordings/seg_001_audio.webm" |
| `VideoFilePath` | Single line text | Relative path to video file | e.g., "_output/session123/recordings/seg_001_video.webm" |
| `TranscriptionStatus` | Single select | Processing status | Options: Pending, Processing, Completed, Error |
| `ErrorMessage` | Long text | Error details if failed | |
| `FullTranscriptionText` | Long text | Complete text from all transcription segments | Rollup/concatenation from TranscriptionSegments |
| `Created` | Created time | Auto-generated creation time | |
| `TranscriptionSegments` | Link to another record | Links to TranscriptionSegments table | Allow linking to multiple records |
---
## Table 3: TranscriptionSegments
**Description:** One record per diarized speech segment from OpenAI (many per video segment)
### Fields
| Field Name | Type | Description | Options |
|------------|------|-------------|---------|
| `Id` | Autonumber | Auto-incrementing ID (Primary) | Primary field |
| `VideoSegment` | Link to another record | Links to VideoSegments table | Single link |
| `Session` | Link to another record | Links to Sessions table | Single link (via VideoSegment lookup) |
| `Speaker` | Single line text | Speaker label or name | e.g., "A", "B", "Anna", "Bob" |
| `Text` | Long text | Transcribed text content | |
| `StartSeconds` | Number | Start time in seconds (from segment start) | Decimal |
| `EndSeconds` | Number | End time in seconds (from segment start) | Decimal |
| `DurationSeconds` | Formula | Calculated duration | Formula: `{EndSeconds} - {StartSeconds}` |
| `IsNamed` | Checkbox | Whether speaker has assigned name | True if Speaker is not just A/B/C |
| `Confidence` | Number | Confidence score if available | Decimal (0-1) |
| `Created` | Created time | Auto-generated creation time | |
---
## Table 4: SpeakerReferences
**Description:** One record per identified speaker voice sample (5-second audio clips)
### Fields
| Field Name | Type | Description | Options |
|------------|------|-------------|---------|
| `Id` | Autonumber | Auto-incrementing ID (Primary) | Primary field |
| `Session` | Link to another record | Links to Sessions table | Single link |
| `OriginalLabel` | Single line text | Original diarization label | e.g., "A", "B", "C" |
| `AssignedName` | Single line text | User-assigned speaker name | e.g., "Anna", "Bob" |
| `AudioSamplePath` | Single line text | Relative path to 5-second WAV file | e.g., "_output/session123/speaker-samples/Anna_audio_sample.wav" |
| `SourceVideoSegment` | Link to another record | Links to VideoSegments table | Single link |
| `SampleStartSeconds` | Number | Start time of sample in source segment | Decimal |
| `SampleEndSeconds` | Number | End time of sample in source segment | Decimal |
| `SampleDurationSeconds` | Formula | Calculated sample duration | Formula: `{SampleEndSeconds} - {SampleStartSeconds}` |
| `Created` | Created time | Auto-generated creation time | |
---
## Relationships
### Sessions � VideoSegments (One-to-Many)
- Each session can have multiple video segments
- Each video segment belongs to exactly one session
### VideoSegments � TranscriptionSegments (One-to-Many)
- Each video segment can have multiple transcription segments
- Each transcription segment belongs to exactly one video segment
### Sessions � SpeakerReferences (One-to-Many)
- Each session can have multiple speaker references
- Each speaker reference belongs to exactly one session
### VideoSegments � SpeakerReferences (One-to-Many)
- Each video segment can be the source for multiple speaker samples
- Each speaker reference has exactly one source video segment
---
## Example Data Flow
### 1. Session Created
```
Sessions table:
- SessionId: "session_20250118_143022"
- StartTime: 2025-01-18 14:30:22
- Model: "gpt-4o-transcribe-diarize"
- Status: "Active"
```
### 2. Video Segment Recorded
```
VideoSegments table:
- SegmentId: "seg_001"
- Session: � session_20250118_143022
- SegmentNumber: 1
- DurationSeconds: 600
- AudioFilePath: "_output/session_20250118_143022/recordings/recording_001_audio.webm"
- TranscriptionStatus: "Processing"
```
### 3. Transcription Completed
```
TranscriptionSegments table (multiple records):
- VideoSegment: � seg_001
- Speaker: "A"
- Text: "Hello, how are you today?"
- StartSeconds: 0.5
- EndSeconds: 2.3
- VideoSegment: � seg_001
- Speaker: "B"
- Text: "I'm doing great, thanks for asking!"
- StartSeconds: 2.8
- EndSeconds: 5.1
```
### 4. Speaker Identified
```
SpeakerReferences table:
- Session: � session_20250118_143022
- OriginalLabel: "A"
- AssignedName: "Anna"
- AudioSamplePath: "_output/session_20250118_143022/speaker-samples/Anna_audio_sample.wav"
- SourceVideoSegment: � seg_001
- SampleStartSeconds: 0.5
- SampleEndSeconds: 5.5
```
All subsequent TranscriptionSegments with Speaker "A" are now identified as "Anna"
---
## Setup Instructions
### 1. Create Base
1. Go to Airtable and create a new base
2. Name it "Transcription System" or similar
### 2. Create Tables
Create four tables with the exact names:
- `Sessions`
- `VideoSegments`
- `TranscriptionSegments`
- `SpeakerReferences`
### 3. Add Fields
For each table, add the fields listed above in the exact order with PascalCase names.
**Important Notes:**
- Use **PascalCase** for all field names (e.g., `SessionId`, `StartTime`, `VideoSegment`)
- Set the Primary field as indicated
- Configure Single select options exactly as shown
- Set up formulas for calculated fields
- Configure linked records to allow appropriate linking (single vs multiple)
### 4. Configure Relationships
1. In Sessions table, add linked record fields:
- `VideoSegments` � links to VideoSegments table (allow multiple)
- `SpeakerReferences` � links to SpeakerReferences table (allow multiple)
2. In VideoSegments table, add linked record fields:
- `Session` � links to Sessions table (single link)
- `TranscriptionSegments` � links to TranscriptionSegments table (allow multiple)
3. In TranscriptionSegments table, add linked record fields:
- `VideoSegment` � links to VideoSegments table (single link)
- `Session` � lookup from VideoSegment
4. In SpeakerReferences table, add linked record fields:
- `Session` � links to Sessions table (single link)
- `SourceVideoSegment` � links to VideoSegments table (single link)
### 5. Get API Credentials
1. Go to Account settings � API
2. Generate a Personal Access Token with full access to your base
3. Copy your Base ID from the base URL or API documentation
### 6. Environment Variables
Add to your `.env.local`:
```bash
AIRTABLE_API_KEY=your_personal_access_token
AIRTABLE_BASE_ID=your_base_id
AIRTABLE_SESSIONS_TABLE=Sessions
AIRTABLE_VIDEO_SEGMENTS_TABLE=VideoSegments
AIRTABLE_TRANSCRIPTION_SEGMENTS_TABLE=TranscriptionSegments
AIRTABLE_SPEAKER_REFERENCES_TABLE=SpeakerReferences
```
---
## Query Examples
### Get all transcription segments for a session
```
Filter by: Session = "session_20250118_143022"
From: TranscriptionSegments table
Sort by: StartSeconds (ascending)
```
### Find all instances where "Anna" spoke
```
Filter by: Speaker = "Anna"
From: TranscriptionSegments table
Group by: Session
```
### Get session statistics
```
From: Sessions table
Fields: SessionId, TotalSegments, TotalSpeakers, Status
View: Group by Status
```
### Find incomplete segments
```
Filter by: TranscriptionStatus = "Processing" OR "Error"
From: VideoSegments table
Sort by: StartTime (descending)
```
---
## Notes
- All file paths are relative to the project root
- Audio/video files are stored in `_output` directory (gitignored)
- Timestamps use the local timezone of the recording
- Speaker labels from OpenAI diarization are typically A, B, C, etc.
- Once a speaker is assigned a name, that name appears in future transcriptions
- The 5-second audio samples are used by OpenAI for speaker identification