transcription-schema

# transcription-schema # Airtable Schema for Transcription System ## Overview This document describes the Airtable database schema for storing transcription sessions, video segments, transcription segments, and speaker references from the v5 speaker identification system. ## Database Structure ### Four Tables with Relationships ``` Sessions (1) � VideoSegments (many) � TranscriptionSegments (many) Sessions (1) � SpeakerReferences (many) ``` --- ## Table 1: Sessions **Description:** One record per recording session ### Fields | Field Name | Type | Description | Options | |------------|------|-------------|---------| | `SessionId` | Single line text | Unique session identifier (Primary) | Primary field | | `StartTime` | Date | Session start timestamp | Include time | | `EndTime` | Date | Session end timestamp | Include time | | `Model` | Single line text | OpenAI model used | e.g., "gpt-4o-transcribe-diarize" | | `Language` | Single line text | Language code | e.g., "en", "es" | | `Prompt` | Long text | Custom prompt for transcription | | | `AutoSplitMinutes` | Number | Minutes between auto-splits | Integer | | `FilePrefix` | Single line text | Prefix for saved files | e.g., "recording" | | `Status` | Single select | Session status | Options: Active, Completed, Error | | `TotalSegments` | Number | Count of video segments | Rollup/Count from VideoSegments | | `TotalSpeakers` | Number | Count of identified speakers | Integer | | `Created` | Created time | Auto-generated creation time | | | `VideoSegments` | Link to another record | Links to VideoSegments table | Allow linking to multiple records | | `SpeakerReferences` | Link to another record | Links to SpeakerReferences table | Allow linking to multiple records | --- ## Table 2: VideoSegments **Description:** One record per MediaRecorder segment (e.g., 10-minute chunks) ### Fields | Field Name | Type | Description | Options | |------------|------|-------------|---------| | `SegmentId` | Single line text | Unique segment identifier (Primary) | Primary field, e.g., "seg_001" | | `Session` | Link to another record | Links to Sessions table | Single link | | `SegmentNumber` | Number | Sequential number within session | Integer | | `StartTime` | Date | Segment start timestamp | Include time | | `EndTime` | Date | Segment end timestamp | Include time | | `DurationSeconds` | Number | Duration in seconds | Decimal | | `AudioFilePath` | Single line text | Relative path to audio file | e.g., "_output/session123/recordings/seg_001_audio.webm" | | `VideoFilePath` | Single line text | Relative path to video file | e.g., "_output/session123/recordings/seg_001_video.webm" | | `TranscriptionStatus` | Single select | Processing status | Options: Pending, Processing, Completed, Error | | `ErrorMessage` | Long text | Error details if failed | | | `FullTranscriptionText` | Long text | Complete text from all transcription segments | Rollup/concatenation from TranscriptionSegments | | `Created` | Created time | Auto-generated creation time | | | `TranscriptionSegments` | Link to another record | Links to TranscriptionSegments table | Allow linking to multiple records | --- ## Table 3: TranscriptionSegments **Description:** One record per diarized speech segment from OpenAI (many per video segment) ### Fields | Field Name | Type | Description | Options | |------------|------|-------------|---------| | `Id` | Autonumber | Auto-incrementing ID (Primary) | Primary field | | `VideoSegment` | Link to another record | Links to VideoSegments table | Single link | | `Session` | Link to another record | Links to Sessions table | Single link (via VideoSegment lookup) | | `Speaker` | Single line text | Speaker label or name | e.g., "A", "B", "Anna", "Bob" | | `Text` | Long text | Transcribed text content | | | `StartSeconds` | Number | Start time in seconds (from segment start) | Decimal | | `EndSeconds` | Number | End time in seconds (from segment start) | Decimal | | `DurationSeconds` | Formula | Calculated duration | Formula: `{EndSeconds} - {StartSeconds}` | | `IsNamed` | Checkbox | Whether speaker has assigned name | True if Speaker is not just A/B/C | | `Confidence` | Number | Confidence score if available | Decimal (0-1) | | `Created` | Created time | Auto-generated creation time | | --- ## Table 4: SpeakerReferences **Description:** One record per identified speaker voice sample (5-second audio clips) ### Fields | Field Name | Type | Description | Options | |------------|------|-------------|---------| | `Id` | Autonumber | Auto-incrementing ID (Primary) | Primary field | | `Session` | Link to another record | Links to Sessions table | Single link | | `OriginalLabel` | Single line text | Original diarization label | e.g., "A", "B", "C" | | `AssignedName` | Single line text | User-assigned speaker name | e.g., "Anna", "Bob" | | `AudioSamplePath` | Single line text | Relative path to 5-second WAV file | e.g., "_output/session123/speaker-samples/Anna_audio_sample.wav" | | `SourceVideoSegment` | Link to another record | Links to VideoSegments table | Single link | | `SampleStartSeconds` | Number | Start time of sample in source segment | Decimal | | `SampleEndSeconds` | Number | End time of sample in source segment | Decimal | | `SampleDurationSeconds` | Formula | Calculated sample duration | Formula: `{SampleEndSeconds} - {SampleStartSeconds}` | | `Created` | Created time | Auto-generated creation time | | --- ## Relationships ### Sessions � VideoSegments (One-to-Many) - Each session can have multiple video segments - Each video segment belongs to exactly one session ### VideoSegments � TranscriptionSegments (One-to-Many) - Each video segment can have multiple transcription segments - Each transcription segment belongs to exactly one video segment ### Sessions � SpeakerReferences (One-to-Many) - Each session can have multiple speaker references - Each speaker reference belongs to exactly one session ### VideoSegments � SpeakerReferences (One-to-Many) - Each video segment can be the source for multiple speaker samples - Each speaker reference has exactly one source video segment --- ## Example Data Flow ### 1. Session Created ``` Sessions table: - SessionId: "session_20250118_143022" - StartTime: 2025-01-18 14:30:22 - Model: "gpt-4o-transcribe-diarize" - Status: "Active" ``` ### 2. Video Segment Recorded ``` VideoSegments table: - SegmentId: "seg_001" - Session: � session_20250118_143022 - SegmentNumber: 1 - DurationSeconds: 600 - AudioFilePath: "_output/session_20250118_143022/recordings/recording_001_audio.webm" - TranscriptionStatus: "Processing" ``` ### 3. Transcription Completed ``` TranscriptionSegments table (multiple records): - VideoSegment: � seg_001 - Speaker: "A" - Text: "Hello, how are you today?" - StartSeconds: 0.5 - EndSeconds: 2.3 - VideoSegment: � seg_001 - Speaker: "B" - Text: "I'm doing great, thanks for asking!" - StartSeconds: 2.8 - EndSeconds: 5.1 ``` ### 4. Speaker Identified ``` SpeakerReferences table: - Session: � session_20250118_143022 - OriginalLabel: "A" - AssignedName: "Anna" - AudioSamplePath: "_output/session_20250118_143022/speaker-samples/Anna_audio_sample.wav" - SourceVideoSegment: � seg_001 - SampleStartSeconds: 0.5 - SampleEndSeconds: 5.5 ``` All subsequent TranscriptionSegments with Speaker "A" are now identified as "Anna" --- ## Setup Instructions ### 1. Create Base 1. Go to Airtable and create a new base 2. Name it "Transcription System" or similar ### 2. Create Tables Create four tables with the exact names: - `Sessions` - `VideoSegments` - `TranscriptionSegments` - `SpeakerReferences` ### 3. Add Fields For each table, add the fields listed above in the exact order with PascalCase names. **Important Notes:** - Use **PascalCase** for all field names (e.g., `SessionId`, `StartTime`, `VideoSegment`) - Set the Primary field as indicated - Configure Single select options exactly as shown - Set up formulas for calculated fields - Configure linked records to allow appropriate linking (single vs multiple) ### 4. Configure Relationships 1. In Sessions table, add linked record fields: - `VideoSegments` � links to VideoSegments table (allow multiple) - `SpeakerReferences` � links to SpeakerReferences table (allow multiple) 2. In VideoSegments table, add linked record fields: - `Session` � links to Sessions table (single link) - `TranscriptionSegments` � links to TranscriptionSegments table (allow multiple) 3. In TranscriptionSegments table, add linked record fields: - `VideoSegment` � links to VideoSegments table (single link) - `Session` � lookup from VideoSegment 4. In SpeakerReferences table, add linked record fields: - `Session` � links to Sessions table (single link) - `SourceVideoSegment` � links to VideoSegments table (single link) ### 5. Get API Credentials 1. Go to Account settings � API 2. Generate a Personal Access Token with full access to your base 3. Copy your Base ID from the base URL or API documentation ### 6. Environment Variables Add to your `.env.local`: ```bash AIRTABLE_API_KEY=your_personal_access_token AIRTABLE_BASE_ID=your_base_id AIRTABLE_SESSIONS_TABLE=Sessions AIRTABLE_VIDEO_SEGMENTS_TABLE=VideoSegments AIRTABLE_TRANSCRIPTION_SEGMENTS_TABLE=TranscriptionSegments AIRTABLE_SPEAKER_REFERENCES_TABLE=SpeakerReferences ``` --- ## Query Examples ### Get all transcription segments for a session ``` Filter by: Session = "session_20250118_143022" From: TranscriptionSegments table Sort by: StartSeconds (ascending) ``` ### Find all instances where "Anna" spoke ``` Filter by: Speaker = "Anna" From: TranscriptionSegments table Group by: Session ``` ### Get session statistics ``` From: Sessions table Fields: SessionId, TotalSegments, TotalSpeakers, Status View: Group by Status ``` ### Find incomplete segments ``` Filter by: TranscriptionStatus = "Processing" OR "Error" From: VideoSegments table Sort by: StartTime (descending) ``` --- ## Notes - All file paths are relative to the project root - Audio/video files are stored in `_output` directory (gitignored) - Timestamps use the local timezone of the recording - Speaker labels from OpenAI diarization are typically A, B, C, etc. - Once a speaker is assigned a name, that name appears in future transcriptions - The 5-second audio samples are used by OpenAI for speaker identification