TA Directions for Red/Blue Wargame

# TA Directions for Red/Blue Wargame ## Whole Class For the first 10m of class: Students share their jailbreak techniques, share any you can think of (at the end to allow for them to have their own ideas) Second 10m of class: help the class come up with "bad things the AI should not do" by writing your own github gist. We'll get these gists shared in the channel, grab all the links ## Split Into Teams Liz will manually split people in the room. Roan's Team (first 15m exercise): Get the "bad prompts" from the conversation above- as a group, get GPT to perform the bad prompts via the techniques [listed here](https://publish.obsidian.md/themultiverseschool/Curriculum/Autonomous+Agents/Prompt+Engineering/Workshop+2+-+Jailbreaks#Jailbreak+Exercises) Second 15m of Exercise: After Gene's Team publishes their GPTs, begin trying to capture the secret words, there will be one word per GPT. Gene's Team (first 15m of exercise): Create several GPTs by splitting everyone into sub-teams of two and getting them produce one custom GPT per team, that defends their "secret" - a word selected from your document. Each team of two gets one word. Gene's Team (second 15m of exercise): Observe the red teams as they work on your GPT, capture successful and unsuccessful shared chat links. Swap at the 30m mark.