ai-hackathon-group-1-ai-capture-agents

# ai-hackathon-group-1-ai-capture-agents ## the capture lab LEAD: Jordan Studio Agents: AI-Enhanced Oral Exams: Real-time studio agents that augment oral exams and presentations with transcription, tagging, and vision. These agents detect and describe visual materials (cards, slides, images) as they appear, automatically generating interleaved Markdown summaries or transcripts. A step toward an intelligent LL Studio that documents and enhances performance-based learning. # 3-Hour Hackathon Plan - The Capture Lab ## Goal: AI Interviewer with Vision Demo **What We're Building**: A web app with these specific features: ### Core Features: 1. **Webcam View** - Live video feed from webcam 2. **Interviewer Bot Toggle** - Start/stop AI interviewer 3. **Smart Question Flow**: - Phase 1: Bot asks questions until it gets your **name** - Phase 2: Bot asks questions until it gets your **project name** - Phase 3: Once name + project confirmed, bot asks intelligent **follow-up questions** 4. **Image Capture + Vision**: - Button to capture still from webcam - Button to describe the captured image using GPT-4 Vision 5. **Live Transcript** - See conversation in real-time ### Tech Stack: - **Whisper API** - Speech-to-text transcription - **GPT-4 API** - Question generation + name/project extraction - **GPT-4 Vision API** - Image description - **MediaRecorder API** - Browser audio recording - **getUserMedia API** - Webcam access **What We're NOT Building** (use existing studio gear): - ❌ Video capture server (use OBS or existing recording setup) - ❌ Multi-camera switching (manual or existing workflow) - ❌ File management (record to wherever you normally do) - ❌ Database (session data lives in memory for demo) - ❌ Beautiful UI (functional > pretty) --- --- ## 🚀 Quick Start (Testing What We Built) **Everything is already built! Here's how to test it:** 1. **Start dev server** (if not running): ```bash cd /Users/metal/Development/the-capture-lab/web pnpm dev ``` 2. **Open in browser**: ``` http://localhost:3000/recording-ui-options/03-Interviewer ``` 3. **Allow permissions** when prompted: - Camera access - Microphone access 4. **Test the flow**: - Click "Start Interviewer Bot" - Answer questions by clicking "Start Answer" → speak → "Stop Answer" - Watch your name and project get extracted automatically - Try "Capture Still" and "Describe Image" **If you see errors**: The dev server might need a restart after creating new files. --- ## Pre-Hackathon Setup (Do Now - 15 minutes) [COMPLETED ✅] ### 1. Get API Keys - [ ] OpenAI API key: https://platform.openai.com/api-keys - [ ] Test it works: `curl https://api.openai.com/v1/models -H "Authorization: Bearer YOUR_KEY"` ### 2. Verify Next.js App Works ```bash cd /Users/metal/Development/the-capture-lab/web pnpm install pnpm dev # Open http://localhost:3000 - should see app ``` ### 3. Create .env.local ```bash cd web echo "OPENAI_API_KEY=your_key_here" > .env.local ``` ### 4. Have Recording Studio Ready - Camera(s) pointed at subject area - Mic working - OBS or recording software configured - Test recording to verify it works --- ## Implementation Plan (What We Actually Built) ### ✅ COMPLETED - All APIs and UI are ready! **What's Already Done**: 1. ✅ **API Routes Created**: - `/api/transcribe` - Whisper speech-to-text - `/api/interviewer` - Smart question bot with name/project extraction - `/api/describe-image` - GPT-4 Vision image descriptions 2. ✅ **Utils Created**: - `AudioRecorder` class in `/app/lib/audioRecorder.ts` 3. ✅ **Complete UI Built**: - `/app/recording-ui-options/03-Interviewer/page.tsx` - Split-screen: Webcam (left) + Controls (right) - All features implemented ### Demo Flow (How It Works): **Phase 1: Getting Name** 1. Click "Start Interviewer Bot" 2. Bot asks: "Hi! What's your name?" 3. Click "Start Answer" → speak your name → "Stop Answer" 4. Bot transcribes with Whisper 5. Bot extracts your name (appears in green box) **Phase 2: Getting Project** 1. Bot asks: "What project are you working on?" 2. Answer the same way 3. Bot extracts project name (appears in blue box) 4. Click "Confirm & Continue to Follow-ups" **Phase 3: Follow-up Questions** 1. Bot asks contextual questions about YOUR project 2. Questions adapt based on your previous answers 3. Full transcript appears on right side **Bonus: Image Capture** 1. Click "Capture Still" anytime 2. Click "Describe Image" 3. GPT-4 Vision analyzes and describes what it sees --- ## Hour 1: Basic AI Interview Flow (Core Logic) [COMPLETED ✅] **Goal**: Get AI asking questions and responding to answers ### Minutes 0-20: Set Up API Routes [DONE] **Create `/web/app/api/transcribe/route.ts`**: ```typescript import { NextRequest, NextResponse } from 'next/server'; import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); export async function POST(request: NextRequest) { try { const formData = await request.formData(); const audioFile = formData.get('audio') as File; if (!audioFile) { return NextResponse.json({ error: 'No audio file' }, { status: 400 }); } const transcription = await openai.audio.transcriptions.create({ file: audioFile, model: 'whisper-1', }); return NextResponse.json({ text: transcription.text }); } catch (error) { console.error('Transcription error:', error); return NextResponse.json({ error: 'Transcription failed' }, { status: 500 }); } } ``` **Create `/web/app/api/question/route.ts`**: ```typescript import { NextRequest, NextResponse } from 'next/server'; import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); export async function POST(request: NextRequest) { try { const { transcript, questionNumber } = await request.json(); const prompt = questionNumber === 1 ? "Generate a warm, engaging opening question for an interview. Keep it under 25 words." : `Based on this transcript: "${transcript}"\n\nGenerate a thoughtful follow-up question that explores their answer deeper. Keep it under 25 words.`; const completion = await openai.chat.completions.create({ model: 'gpt-4', messages: [ { role: 'system', content: 'You are a thoughtful interviewer. Ask engaging, open-ended questions. Output ONLY the question, no preamble.', }, { role: 'user', content: prompt, }, ], temperature: 0.8, max_tokens: 100, }); const question = completion.choices[0].message.content?.trim() || 'Tell me more.'; return NextResponse.json({ question }); } catch (error) { console.error('Question generation error:', error); return NextResponse.json({ error: 'Question generation failed' }, { status: 500 }); } } ``` **Install OpenAI SDK**: ```bash cd web pnpm add openai ``` ### Minutes 20-40: Copy Audio Recorder from Reference **Copy `/web/app/lib/audioRecorder.ts`** from reference repo or create simple version: ```typescript export class AudioRecorder { private mediaRecorder: MediaRecorder | null = null; private chunks: Blob[] = []; private stream: MediaStream | null = null; async initialize(stream: MediaStream) { this.stream = stream; this.mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm', }); this.mediaRecorder.ondataavailable = (event) => { if (event.data.size > 0) { this.chunks.push(event.data); } }; } start() { this.chunks = []; this.mediaRecorder?.start(); } async stop(): Promise<{ blob: Blob; duration: number }> { return new Promise((resolve) => { if (!this.mediaRecorder) { resolve({ blob: new Blob(), duration: 0 }); return; } this.mediaRecorder.onstop = () => { const blob = new Blob(this.chunks, { type: 'audio/webm' }); resolve({ blob, duration: 0 }); }; this.mediaRecorder.stop(); }); } cleanup() { this.stream?.getTracks().forEach((track) => track.stop()); } } ``` ### Minutes 40-60: Build Basic Interview UI **Create `/web/app/studio/page.tsx`**: ```typescript 'use client'; import { useState, useRef } from 'react'; import { AudioRecorder } from '@/lib/audioRecorder'; type Stage = 'ready' | 'recording' | 'processing' | 'complete'; export default function StudioPage() { const [stage, setStage] = useState<Stage>('ready'); const [question, setQuestion] = useState<string>(''); const [transcript, setTranscript] = useState<string>(''); const [questionNumber, setQuestionNumber] = useState(0); const [isRecordingAnswer, setIsRecordingAnswer] = useState(false); const audioRecorderRef = useRef<AudioRecorder | null>(null); async function startSession() { // Request mic permission const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); audioRecorderRef.current = new AudioRecorder(); await audioRecorderRef.current.initialize(stream); // Get first question setStage('recording'); setQuestionNumber(1); const res = await fetch('/api/question', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ transcript: '', questionNumber: 1 }), }); const data = await res.json(); setQuestion(data.question); } async function startAnswer() { setIsRecordingAnswer(true); audioRecorderRef.current?.start(); } async function stopAnswer() { setIsRecordingAnswer(false); setStage('processing'); // Stop recording and get audio const { blob } = await audioRecorderRef.current!.stop(); // Transcribe const formData = new FormData(); formData.append('audio', blob, 'audio.webm'); const transcribeRes = await fetch('/api/transcribe', { method: 'POST', body: formData, }); const { text } = await transcribeRes.json(); const newTranscript = transcript + '\n\nQ: ' + question + '\nA: ' + text; setTranscript(newTranscript); // Check if done (5 questions) if (questionNumber >= 5) { setStage('complete'); return; } // Get next question const questionRes = await fetch('/api/question', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ transcript: newTranscript, questionNumber: questionNumber + 1 }), }); const { question: nextQuestion } = await questionRes.json(); setQuestion(nextQuestion); setQuestionNumber(questionNumber + 1); setStage('recording'); } function endSession() { audioRecorderRef.current?.cleanup(); setStage('complete'); } return ( <div style={{ padding: '40px', maxWidth: '800px', margin: '0 auto' }}> <h1 style={{ fontSize: '32px', marginBottom: '20px' }}>The Capture Lab</h1> {stage === 'ready' && ( <div> <p style={{ marginBottom: '20px' }}> Ready to start your AI-guided recording session? </p> <button onClick={startSession} style={{ padding: '12px 24px', fontSize: '18px', background: '#0070f3', color: 'white', border: 'none', borderRadius: '8px', cursor: 'pointer', }} > Start Session </button> </div> )} {stage === 'recording' && ( <div> <div style={{ marginBottom: '30px' }}> <div style={{ fontSize: '14px', color: '#666', marginBottom: '10px' }}> Question {questionNumber} of 5 </div> <div style={{ fontSize: '24px', fontWeight: 'bold', marginBottom: '20px' }}> {question} </div> </div> {!isRecordingAnswer ? ( <button onClick={startAnswer} style={{ padding: '12px 24px', fontSize: '18px', background: '#0070f3', color: 'white', border: 'none', borderRadius: '8px', cursor: 'pointer', marginRight: '10px', }} > 🎤 Start Answer </button> ) : ( <button onClick={stopAnswer} style={{ padding: '12px 24px', fontSize: '18px', background: '#ff4444', color: 'white', border: 'none', borderRadius: '8px', cursor: 'pointer', marginRight: '10px', }} > ⏹ Stop Answer </button> )} <button onClick={endSession} style={{ padding: '12px 24px', fontSize: '18px', background: '#666', color: 'white', border: 'none', borderRadius: '8px', cursor: 'pointer', }} > End Session </button> </div> )} {stage === 'processing' && ( <div style={{ fontSize: '18px' }}> Processing your answer... </div> )} {stage === 'complete' && ( <div> <h2 style={{ fontSize: '24px', marginBottom: '20px' }}>Session Complete!</h2> <div style={{ background: '#f5f5f5', padding: '20px', borderRadius: '8px' }}> <h3>Transcript:</h3> <pre style={{ whiteSpace: 'pre-wrap', fontSize: '14px' }}> {transcript} </pre> </div> </div> )} </div> ); } ``` **Test it**: Open http://localhost:3000/studio --- ## Hour 2: Integrate with Recording Studio **Goal**: Coordinate web app with your existing recording setup ### Minutes 60-80: Manual Coordination (Simplest) **Option A: Just use as teleprompter** - Open `/studio` on a monitor/tablet - Subject sees questions - You manually start/stop OBS recording - No integration needed - web app is just the "interviewer" **Option B: Add simple recording indicator** Update studio page to show recording status: ```typescript // Add this state const [isStudioRecording, setIsStudioRecording] = useState(false); // Add keyboard shortcut useEffect(() => { function handleKeyPress(e: KeyboardEvent) { if (e.key === 'r' && e.metaKey) { // Cmd+R setIsStudioRecording(!isStudioRecording); } } window.addEventListener('keydown', handleKeyPress); return () => window.removeEventListener('keydown', handleKeyPress); }, [isStudioRecording]); // Add visual indicator {isStudioRecording && ( <div style={{ position: 'fixed', top: 20, right: 20, background: 'red', color: 'white', padding: '10px 20px', borderRadius: '8px', fontSize: '18px', }}> 🔴 RECORDING </div> )} ``` Press **Cmd+R** to toggle recording indicator (you control OBS separately). ### Minutes 80-100: Add Session Timing ```typescript const [sessionStartTime, setSessionStartTime] = useState<Date | null>(null); const [elapsedSeconds, setElapsedSeconds] = useState(0); // Start timer when session begins useEffect(() => { if (stage === 'recording' && !sessionStartTime) { setSessionStartTime(new Date()); } }, [stage]); // Update elapsed time useEffect(() => { if (stage !== 'recording') return; const interval = setInterval(() => { if (sessionStartTime) { const elapsed = Math.floor((Date.now() - sessionStartTime.getTime()) / 1000); setElapsedSeconds(elapsed); } }, 1000); return () => clearInterval(interval); }, [stage, sessionStartTime]); // Display timer function formatTime(seconds: number) { const mins = Math.floor(seconds / 60); const secs = seconds % 60; return `${mins}:${secs.toString().padStart(2, '0')}`; } // Add to UI <div style={{ fontSize: '24px', marginBottom: '20px' }}> ⏱️ {formatTime(elapsedSeconds)} </div> ``` This gives you timestamps you can match to your studio recordings later. --- ## Hour 3: Polish & Demo Prep **Goal**: Make it demo-ready ### Minutes 100-120: Improve Question Quality **Update question prompt** in `/web/app/api/question/route.ts`: ```typescript const systemPrompt = `You are an engaging interviewer conducting a video interview. Your goal is to help the subject tell their story naturally and authentically. Question style: - Ask open-ended questions that invite storytelling - Use simple, conversational language - Keep questions under 25 words - Build on what they just said - Show genuine curiosity Avoid: - Yes/no questions - Multiple questions at once - Overly formal language - Generic questions`; const userPrompt = questionNumber === 1 ? "Ask a warm opening question that helps them introduce themselves. Something like 'Tell me about yourself' but more specific and interesting." : `They just said: "${transcript.split('\n').slice(-2).join(' ')}"\n\nAsk a follow-up question that explores this deeper or takes the conversation in an interesting direction.`; ``` ### Minutes 120-140: Add Summary Generation **Create `/web/app/api/summary/route.ts`**: ```typescript import { NextRequest, NextResponse } from 'next/server'; import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); export async function POST(request: NextRequest) { try { const { transcript } = await request.json(); const completion = await openai.chat.completions.create({ model: 'gpt-4', messages: [ { role: 'system', content: 'You are a video editor reviewing interview footage. Create a concise summary that highlights the key points and interesting moments.', }, { role: 'user', content: `Summarize this interview transcript in 3-5 bullet points:\n\n${transcript}`, }, ], temperature: 0.7, }); return NextResponse.json({ summary: completion.choices[0].message.content }); } catch (error) { console.error('Summary error:', error); return NextResponse.json({ error: 'Summary failed' }, { status: 500 }); } } ``` **Use in complete stage**: ```typescript // When session completes const [summary, setSummary] = useState<string>(''); async function endSession() { audioRecorderRef.current?.cleanup(); setStage('complete'); // Generate summary const res = await fetch('/api/summary', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ transcript }), }); const { summary } = await res.json(); setSummary(summary); } // Display in complete stage {stage === 'complete' && ( <div> <h2>Session Complete!</h2> <div style={{ marginBottom: '30px' }}> <h3>Key Points:</h3> <div style={{ whiteSpace: 'pre-wrap' }}>{summary}</div> </div> <div> <h3>Full Transcript:</h3> <pre style={{ whiteSpace: 'pre-wrap', fontSize: '14px' }}> {transcript} </pre> </div> </div> )} ``` ### Minutes 140-160: Save Sessions Locally **Create `/web/app/api/session/route.ts`**: ```typescript import { NextRequest, NextResponse } from 'next/server'; import { writeFile } from 'fs/promises'; import path from 'path'; export async function POST(request: NextRequest) { try { const { transcript, summary, startTime } = await request.json(); const sessionId = `session-${Date.now()}`; const data = { sessionId, startTime, endTime: new Date().toISOString(), transcript, summary, }; // Save to storage directory const filePath = path.join(process.cwd(), '..', 'storage', 'sessions', `${sessionId}.json`); await writeFile(filePath, JSON.stringify(data, null, 2)); return NextResponse.json({ sessionId }); } catch (error) { console.error('Save error:', error); return NextResponse.json({ error: 'Save failed' }, { status: 500 }); } } ``` Make sure `storage/sessions/` directory exists: ```bash mkdir -p /Users/metal/Development/the-capture-lab/storage/sessions ``` ### Minutes 160-180: Final Testing & Demo Prep **Demo Script**: 1. Open browser to http://localhost:3000/studio 2. Click "Start Session" 3. Read first question aloud 4. Click "Start Answer" 5. Answer for 20-30 seconds 6. Click "Stop Answer" 7. Wait for next question (~3 seconds) 8. Repeat for 2-3 questions 9. Click "End Session" 10. Show transcript and summary **Quick improvements**: - Add loading spinners during "Processing" - Test with different answer lengths - Have backup questions ready in case API is slow - Clear browser console of errors - Test audio input works --- ## Fallback Plan (If APIs Are Slow) If OpenAI API is taking too long during hackathon: **Mock the APIs** for demo: ```typescript // In /web/app/api/question/route.ts const mockQuestions = [ "Tell me about yourself and what you're working on.", "What inspired you to pursue this direction?", "What's been the biggest challenge so far?", "Where do you see this going in the future?", "What advice would you give to someone starting out?", ]; export async function POST(request: NextRequest) { const { questionNumber } = await request.json(); // Return mock question instantly return NextResponse.json({ question: mockQuestions[questionNumber - 1] || "Tell me more." }); } ``` You can always enable real AI after the demo. --- ## What You'll Have After 3 Hours ✅ **Working web app** where: - Subject sees AI-generated questions - Records answers via browser mic - Gets contextual follow-up questions - Sees transcript and summary at end ✅ **Manual studio workflow**: - You start/stop OBS recording separately - Web app provides the "AI interviewer" - Timestamps help sync later ✅ **Demo-ready**: - Works reliably for 5-10 minute sessions - Shows AI intelligence (questions adapt to answers) - Produces usable output (transcript + summary) --- ## Pre-Hackathon Checklist (Do This First!) - [ ] `cd web && pnpm install` - [ ] Create `.env.local` with OpenAI key - [ ] Test: `pnpm dev` and open http://localhost:3000 - [ ] Test mic permission in browser - [ ] Verify studio recording setup works (OBS or whatever you use) - [ ] Have 1-2 practice questions ready to test with --- ## Hour-by-Hour Breakdown | Time | Focus | Deliverable | |------|-------|-------------| | 0:00-0:20 | API routes (transcribe + question) | APIs work via curl/Postman | | 0:20-0:40 | Audio recorder utility | Can record browser audio | | 0:40-1:00 | Basic studio UI | Can click through workflow | | 1:00-1:20 | Studio integration (manual) | Recording indicator | | 1:20-1:40 | Session timing | Timestamps for sync | | 1:40-2:00 | Test full workflow | Can complete 5-question interview | | 2:00-2:20 | Better prompts | Questions are interesting | | 2:20-2:40 | Summary generation | Get summary at end | | 2:40-3:00 | Save sessions + polish | Demo ready | --- ## Success Metrics for Hackathon **Minimum Viable Demo**: - [ ] Can start a session - [ ] AI generates 5 questions - [ ] Questions adapt to answers - [ ] Shows transcript at end **Nice to Have**: - [ ] Summary generation works - [ ] Sessions save to JSON - [ ] UI doesn't break on errors - [ ] Recording indicator syncs with OBS **Stretch Goals** (if time): - [ ] Voice command to advance questions - [ ] Real-time transcript display - [ ] Export transcript as PDF --- ## Tips for Fast Hacking 1. **Copy-paste liberally** from reference repo (`_reference/gened1196-mw`) 2. **Don't worry about styling** - inline styles are fine 3. **Skip error handling** initially - add `try/catch` later 4. **Use console.log everywhere** for debugging 5. **Test incrementally** - don't write 100 lines without testing 6. **Have ChatGPT/Claude open** for quick code generation 7. **Use hot reload** - save files and see changes instantly 8. **Don't restart servers** unless necessary --- ## Emergency Debugging **If APIs don't work**: ```bash # Check env vars loaded cd web node -e "console.log(process.env.OPENAI_API_KEY)" # Test OpenAI directly curl https://api.openai.com/v1/models \ -H "Authorization: Bearer YOUR_KEY" ``` **If audio recording fails**: - Check browser console for errors - Try Chrome (best MediaRecorder support) - Allow mic permission - Test with `navigator.mediaDevices.getUserMedia({ audio: true })` **If Next.js breaks**: ```bash # Nuclear option - restart fresh rm -rf .next node_modules pnpm install pnpm dev ``` --- **GOOD LUCK! You got this! 🚀** Keep it simple, ship it fast, demo it proud. --- ## Troubleshooting ### Module Not Found Error If you see `Module not found: Can't resolve '@/lib/audioRecorder'`: 1. **Stop your dev server** (Ctrl+C) 2. **Clear Next.js cache**: ```bash cd /Users/metal/Development/the-capture-lab/web rm -rf .next ``` 3. **Restart dev server**: ```bash pnpm dev ``` 4. **Hard refresh browser**: Cmd+Shift+R (Mac) or Ctrl+Shift+R (Windows) ### API Errors If transcription or question generation fails: - Check `.env.local` has valid `OPENAI_API_KEY` - Check browser console for specific error messages - Verify API key has credits: https://platform.openai.com/usage ### Camera/Mic Not Working - Make sure you allowed permissions in browser - Try Chrome (best support for MediaRecorder) - Check System Preferences → Security & Privacy → Camera/Microphone - Restart browser if permissions changed ### Image Description Not Working - Make sure you captured an image first (click "Capture Still") - Check that GPT-4 Vision (gpt-4o model) is available on your API key - Image must be visible before clicking "Describe Image" --- ## File Structure Reference ``` web/ ├── app/ │ ├── api/ │ │ ├── transcribe/ │ │ │ └── route.ts ✅ Whisper transcription │ │ ├── interviewer/ │ │ │ └── route.ts ✅ Smart question bot │ │ └── describe-image/ │ │ └── route.ts ✅ GPT-4 Vision │ ├── lib/ │ │ └── audioRecorder.ts ✅ Audio recording utility │ └── recording-ui-options/ │ └── 03-Interviewer/ │ └── page.tsx ✅ Main UI component └── .env.local ✅ OpenAI API key ``` --- ## What's Next? Now that the demo is working, you can: 1. **Customize the questions** - Edit prompts in `/api/interviewer/route.ts` 2. **Add more phases** - Extend beyond name/project to collect other info 3. **Save sessions** - Add a `/api/session/save` endpoint to store transcripts 4. **Improve UI** - Add styling, animations, better layouts 5. **Integrate with OBS** - Add keyboard shortcuts to trigger OBS recording 6. **Export transcript** - Add download button for transcript as PDF/text --- ## Success Metrics Your demo is successful if: - ✅ Webcam shows live video - ✅ Bot asks for name and extracts it - ✅ Bot asks for project and extracts it - ✅ Bot asks contextual follow-up questions - ✅ Image capture works and gets AI description - ✅ Full transcript is visible - ✅ No crashes or API errors **You're ready for the hackathon! 🎉**