mk-agents-planning-20251010

# mk-agents-planning-20251010 ## Agent Builder nodes — quick guide for your “Live Oral Exam” workflow Below is a concise, practical reference for **every node** in Agent Builder, with (1) a brief definition and (2) exactly how you could use it for your **live, AI-assisted oral exam** that transcribes students, checks course materials, and asks adaptive follow-ups. --- ## Core nodes ### Start **Definition:** Entry point. Captures user input (e.g., chat turn, event), appends it to conversation history, and exposes `input_as_text`. You can also define initial **state variables**. **Use in your oral exam:** * Accept the **live transcript** chunk from your Realtime front-end and pass it as `input_as_text`. * Initialize session state: `{ student_id, course_id, topic_targets[], time_limit_min, rubric, scoring={}, covered_topics=[], turn=0 }`. * Optionally branch on a **mode** variable (practice vs graded). --- ### Agent **Definition:** A configured model step (instructions + tools + model settings), optionally with evaluations. Often you’ll have several, each with a narrow job. **Use in your oral exam (recommended agents):** * **Classifier Agent:** Classifies the latest utterance: `{intent: "answer"|"question"|"meta"|"off_topic", confidence}` and detects which **topic** it references. * **Scoring Agent:** Given the student utterance + rubric criteria + passages retrieved (via File Search), outputs **structured scores** (e.g., 1–5) and **evidence citations**. * **Follow-up Generator Agent:** Generates the **next question** based on progress, knowledge gaps, and time remaining. * **Summarizer Agent:** Periodically distills the session into short notes for the examiner; at the end, drafts feedback and an exam report. * **Hallucination Checker Agent (optional):** Cross-checks critical claims against retrieved sources to flag unsupported assertions. > Tip: Keep instructions **scoped**: “You only classify; you only score; you only draft follow-ups.” --- ### Note **Definition:** A sticky comment in the canvas; has no runtime behavior. **Use in your oral exam:** * Document rubric versions, topic maps, and session timing logic for collaborators. * Leave “why” notes near tricky branches (e.g., why we cap follow-ups at 3 per topic). --- ## Tool nodes ### File search **Definition:** Queries your **OpenAI vector store(s)**; returns relevant chunks for the model to use. **Use in your oral exam:** * Index **syllabi, readings, lecture slides, glossaries**, exemplar answers, grading rubrics. * On each turn, retrieve **3–5 passages** keyed to the current topic (from the Classifier Agent) and pass them to the Scoring / Follow-up agents as trusted context. * Use variables in the query: * Query template: `"{topic} {subtopic} definition examples" + {student_utterance_keywords}` * Store ID by course: `vector_store_id = course_id_to_store[course_id]` --- ### Guardrails **Definition:** Checks inputs/outputs for PII, jailbreaks, hallucinations, misuse; pass/fail with branchable outcomes. **Use in your oral exam:** * **Input guard:** If student or observer says something unsafe (harassment, self-harm, doxxing), route to **end** or **human escalation** with a safety reminder. * **Output guard:** Before the Follow-up question goes to the room, check for **bias, stereotype triggers, or revealing answer keys**; if failed, loop back to regenerate. * **Hallucination rail:** If Scoring Agent cites **no** retrieved evidence, fail → re-retrieve with a stricter query or downgrade confidence. --- ### MCP (Model Context Protocol) **Definition:** Connects to external tools/services (OpenAI connectors or your own MCP servers). **Use in your oral exam:** * **Google Drive / Box:** Pull a specific **lecture deck** or **PDF** on demand if referenced by title. * **Airtable / Supabase:** Log **per-turn scores**, timestamps, topics covered, and examiner notes to your assessment DB. * **Slack / Email:** Post the **exam summary** to a private channel or email the student an automated feedback sheet (gated by Human Approval). * **Timers / Room Tech:** Trigger a **count-down** or light cue in the studio; advance a projector scene when moving to a new section. --- ## Logic nodes ### If/else **Definition:** Branch with a **CEL** expression over variables/context. **Use in your oral exam (examples):** * **Intent routing:** ```cel latest.intent == "answer" ``` → go to Scoring; `latest.intent == "question"` → Follow-up Agent decides whether to answer or redirect; `latest.intent == "off_topic"` → gentle nudge back to scope. * **Mastery check:** ```cel avg(scores[topic].recent) >= 4.0 && count(followups[topic]) >= 2 ``` → mark topic complete and branch to next target. * **Time guard:** ```cel now() - session_start >= duration("PT" + time_limit_min + "M") ``` → end sequence and generate final summary. --- ### While **Definition:** Loop while a **CEL** condition is true. **Use in your oral exam:** * **Adaptive questioning loop** per topic: * While `topic_mastered == false && time_remaining > 0`, do: retrieve → generate follow-up → guard → deliver → score → update mastery. * **Evidence retry loop:** * While `scoring.evidence_confidence < 0.6`, tighten the File Search query and rescore (cap at 2 retries). --- ### Human approval **Definition:** Pause and ask a human to approve/deny before continuing. **Use in your oral exam:** * **Gate high-stakes actions:** “Approve this **final grade recommendation** and written feedback?” If **Approve**, MCP→Airtable/Canvas; if **Deny**, loop to Summarizer Agent to revise tone/content. * **On-the-fly question vetting:** For sensitive topics, show the **next follow-up** in a widget to the examiner for a quick “Send / Regenerate.” --- ## Data nodes ### Transform **Definition:** Reshape/validate data structures (e.g., object → array), enforce schema for downstream steps. **Use in your oral exam:** * Convert File Search results into a **uniform citation array**: ```json [{ "title": "...","snippet":"...","page":3,"source_id":"..." }] ``` * Normalize the Scoring Agent’s output to a **rubric schema**: ```json { "criterion":"accuracy","score":4,"evidence":[source_id...] } ``` * Collapse per-turn scores to **rolling topic mastery** features for the If/else and While conditions. --- ### Set state **Definition:** Create/update **global variables** for use anywhere downstream. **Use in your oral exam:** * After each turn: * `covered_topics += [topic_detected]` * `scores[topic].push(new_score)` * `time_remaining = time_limit_min - elapsed_min` * `turn += 1` * At milestones: * `topic_mastery[topic] = "mastered" | "developing" | "needs work"` * `next_topic = pick_next_topic(topic_targets, topic_mastery)` --- ## Putting it together (suggested canvas outline) 1. **Start** → init state; accept transcript chunk as `input_as_text`. 2. **Agent: Classifier** → `{ intent, topic, subtopic, key_terms }`. 3. **If/else** → route by `intent`: * **answer** → (A) * **question/meta** → pass to **Follow-up Agent** (may answer or redirect) * **off_topic** → gentle nudge → back to loop 4. **(A) File search** with `{topic, key_terms}` → passages. 5. **Transform** passages → normalized citations array. 6. **Agent: Scoring** with `{utterance, rubric, citations}` → structured scores + evidence. 7. **Guardrails** (hallucination/evidence) → if fail, **While** retry retrieve/rescore (max 2). 8. **Set state** update: scores, mastery signals, timing. 9. **If/else** mastery/time: * If topic mastered → **Agent: Follow-up** to propose **next topic question**. * Else → **Agent: Follow-up** to probe same topic. 10. **Guardrails** (sensitivity/tone) on the proposed follow-up. 11. **(Optional) Human approval** for the next question in sensitive contexts. 12. **While** (time remaining && topics left) repeat the loop. 13. On exit: **Agent: Summarizer** → draft feedback & report. 14. **Human approval** of final report/grade. 15. **MCP**: log to Airtable/Canvas/Slack; archive transcript and citations; email student summary if approved. --- ## Practical tips * **Small, specialized agents > one mega-agent.** It’s easier to debug and to add evaluations. * **State first.** Decide what you’ll track every turn (scores, topics, evidence) and **name variables consistently**. * **Schemas everywhere.** Use **Transform** to keep outputs predictable; your If/else & While logic will stay simple. * **Guardrails early + late.** Catch problematic inputs and prevent risky outputs (answers that leak test content, biased prompts, etc.). * **MCP as glue.** Don’t overload the model—use MCP to **persist data**, trigger studio cues, and deliver finalized artifacts. --- ## Minimal example snippets **CEL examples** ```cel // Is time up? now() - session_start >= duration("PT" + time_limit_min + "M") // Is current topic mastered? avg(map(scores[current_topic], s, s.score)) >= 4.0 && size(scores[current_topic]) >= 3 // Should we move on? topic_mastery[current_topic] == "mastered" || time_remaining <= duration("PT2M") ``` **Follow-up Agent (instruction sketch)** > “Given: (1) student’s last utterance, (2) topic mastery signals, (3) citations, and (4) time remaining, propose **one** concise, pedagogically sound follow-up that either deepens the current subtopic or transitions to the next unmastered topic. Avoid yes/no questions; require evidence or definitions when appropriate.” --- If you’d like, I can turn this into a **starter canvas** (node names, variables, sample CEL, and prompt stubs) that mirrors your studio setup (Drive/Airtable/Slack connectors, projector cues).