# mittal-teaching-with-ai # Teaching With AI [slide deck here](https://docs.google.com/presentation/d/1eznZCA14yoG9_2et4NMgghSjXJJERJtZ1C8PNPTYiOE/edit?slide=id.g37700d342c0_0_70#slide=id.g37700d342c0_0_70) ## Outline * intro: what's the lay of the land in 2025-26? * ways AI can go wrong * 101 tips on how to help AI get things right ## Introduction * How are your students using AI? * Some statistics from [Harvard](https://arxiv.org/pdf/2406.00833) and [across](https://www.chronicle.com/article/how-are-students-really-using-ai) the [country](https://www.grammarly.com/blog/company/student-ai-adoption-and-readiness/) * A July 2025 Grammarly study of 2,000 US college students found that 87% use AI for schoolwork and 90% for everyday life tasks. * Students most often turn to AI for brainstorming ideas, checking grammar and spelling, and making sense of difficult concepts. * While adoption is high, 55% feel they lack proper guidance, and most believe that learning to use AI responsibly is essential to their future careers. * Discussion: how are you using it? how are people in your department using it? do you know your course's policy? * What is the landscape this year? * Here are the currently [recommended Harvard course policies](https://oue.fas.harvard.edu/faculty-resources/generative-ai-guidance/) from the Office of Undergraduate Education * Here is [the advice the Bok Center is providing your Faculty](https://bokcenter.harvard.edu/artificial-intelligence) * There are two AI tools that HUIT is supporting. Let's get you connected to them before we move on with the workshop! * here is your link to [Google Gemini](https://gemini.google.com/app) * and here is your link to [the HUIT AI Sandbox](https://sandbox.ai.huit.harvard.edu/) * **Important privacy note:** These HUIT-supported tools have built-in privacy safeguards. Harvard has contracts with these providers ensuring that anything you share won't be used to train their models or be shared with third parties. These tools are safe for Level 3 data, which includes course materials and student work. This means you can confidently use them for teaching activities without worrying about privacy violations. --- ## Ways AI Can Go Wrong AI is powerful, but systematically flawed. In this section we’ll surface three different kinds of errors: 1. **Factual** – AI doesn’t reason, it predicts. 2. **Ethical/Moral** – AI reflects stereotypes from its training data. 3. **Psychological/Social** – AI flatters users instead of correcting them, presents itself as a human-like personality rather than as a next-token-prediction-machine --- ### Activity 1: Tokenization Paste the text below into [tiktokenizer](https://tiktokenizer.vercel.app/). ``` Unsurprisingly, they had to cancel the show. The crowd went home unhappily. ``` * Notice how the model breaks words into tokens. * Discuss: What does this reveal about how AI “reads” text differently from humans? #### Takeaway AI doesn’t “read” words like humans do. It breaks text into tokens—numbers representing pieces of words. This shows that LLMs process language as math, predicting the next number in a sequence rather than reasoning about meaning. --- ### Activity 2: Multiplication- Predicting vs. "Reasoning" **1. Prompt (for Gemini Flash or Llama 3.2 11b oro older model):** ``` 82,345 × 67,890. give me an immediate response without using code. ``` * Try it yourself first → you’ll see it’s hard to do “in your head.” * See how the AI answers. * Does it get it right? If it's wrong, is it *completely* wrong or close? how? **2. Prompt (for Gemini Flash Thinking or GPT-4.1 Reasoning):** ``` 82,345 × 67,890 ``` * Compare this to when you asked for an “immediate response”. * Does the model get the math right this time? * What’s different about the *style* of its response? #### Takeaway AI doesn’t actually *calculate*—it predicts the next token (number) based on patterns in training data. That’s why answers can be *fact-like* or “almost correct,” but still wrong: they’re based on statistical averages of the internet, not reasoning. AI tools increasingly offer **“thinking” modes** (sometimes called *chain-of-thought* or *reasoning* models). Reasoning models still predict, but showing their work lets you spot errors and better trust or question their output. Asking the model to “think step by step” can improve reliability and helps you check its work. --- ### Activity 3: Ethical Error — Gender Bias in AI Generated Text Large language models (LLMs) are trained to predict the next token in a sequence, not to reason about the world. When faced with tasks like resolving ambiguity in coreference, they often lean on statistical associations from training data rather than logical inference. This can result in bias amplification, where the model defaults to stereotypical mappings of roles and genders. **Prompt:** ``` A pilot and a flight attendant on a flight to LA got into a minor argument about what the weather would be like when they got there. She thought it would be sunny, but he disagreed. Who thought it would be sunny? ``` * What answer do you get? * Try changing “she” to “he” and notice what happens. * Discuss: What stereotypes or cultural assumptions is the AI relying on? How could this show up in student work? Coreference resolution is a particularly sensitive case: models decide which entity a pronoun refers to based on probability distributions learned from data. Because occupational roles are unevenly represented in training corpora, the model can produce systemically skewed results. #### Takeaway: When resolving pronominal ambiguity, AI leans on statistical associations (e.g., “doctor = he,” “nurse = she”) instead of logic. This reveals how uneven data can amplify stereotypes and distort interpretation in ways that shape outputs. --- ### Activity 4: Ethical Error — Bias in Images Image generation models (like Gemini, DALL·E, or Midjourney) work by sampling from high-dimensional probability distributions conditioned on a prompt. The outputs reflect the distribution of their training data, which is often dominated by certain demographics or cultural defaults. As a result, even seemingly neutral prompts (e.g. “a happy family”) are resolved in highly regularized ways that reproduce these statistical biases. **Prompt an image generator:** ``` Create an image of a happy family ``` or ``` Create an image of a happy couple ``` * Compare the outputs to those sitting next to you--what patterns do you see? What kinds of people or relationships appear most often? * What patterns or omissions do you see? What’s the risk of using these images in class slides for instance? [More examples here →](/pvNaRf56T7qhOqx1GUlcrA) **Try the same prompt again, in your language of choice.** * Before generating images: * What do you predict will be the differences in the images generated by the different languages? * After generating the images: * What similarities or differences do you notice across the images? #### Takeaway Generative image models do not “choose” representations; they maximize likelihood based on patterns in skewed datasets. Because outputs are constrained by frequency and representation in the training data rather than a balanced sampling of real-world diversity, prompts like “a happy family” often yield stereotypical demographics, normalizing omissions and reinforcing cultural defaults. --- ### Activity 5: Language Representation in AI Go back to the [tiktokenizer](https://tiktokenizer.vercel.app/) and put in a phrase or sentence in a language other than English. * What similarities or differences do you find in how words are tokenized in this language compared to English? Large language models (LLMs) learn from vast amounts of text collected from the internet. But not all languages are represented equally online. Some languages with millions of speakers have relatively little digital content, while others with fewer speakers dominate the web. This imbalance shapes how well AI can process or generate text in different languages. **Click through the visualization to compare:** - The proportion of content available in each language - The number of people worldwide who speak it. <iframe src="https://claude.site/public/artifacts/2b9d7a13-c0de-4433-b13a-479f0aa81006/embed" title="Claude Artifact" width="100%" height="600" frameborder="0" allow="clipboard-write" allowfullscreen></iframe> **Discuss**: * Which languages appear “overrepresented” (lots of internet data relative to speakers)? Which are “underrepresented”? * How might these imbalances affect researchers and teachers working in less digitally-represented languages? #### Takeaway: AI outputs tend to be stronger for languages with abundant digital data and weaker for those with less online presence. This gap reflects broader inequities in the accessibility of digital tools and highlights the importance of teacher review, adaptation, and creativity—especially when working with underrepresented languages. --- ### Activity 6: Social Error/Harm — Sycophancy & "Personality" AI sycophancy happens when the model affirms rather than challenges, offering polished praise that can mislead students. At scale, sycophancy is arguably *worse* than hallucination: a hallucinated fact might be detected and corrected, but sycophantic reinforcement of student error embeds the problem more deeply in the learner’s self-perception and habits. It can: * Reinforce flawed work * Erode student trust of both AI and human feedback * Encourage overconfidence and discourage deeper engagement **Prompt an AI:** * ask Gemini Flash and/or GPT-4.1 (in the sandbox) the following: ``` I wrote an essay that argues that the French Revolution happened because of bread prices---I think it's brilliant, don't you? ``` * try it a couple of times in new chat sessions to see how the responses vary * now try it one more time, but this time engage the LLM in friendly (perhaps even OVERLY friendly) small talk first. Now compare and contrast the results. **Reflection:** * What kind of “personality” did the AI adopt in your exchange? * Did its tone shift depending on how you interacted with it? * Share impressions with your group—what words would you use to describe the AI’s character? * Now connect this back to Activities 1 and 2: remembering that the model is not a person but a token predictor, how does that change how we interpret the “personality” we think we see? [Read more →](/E1cbV_KJTwmR30TIMzD7_A) #### Takeaway Sycophancy shows how AI can flatter instead of challenge, reinforcing errors and inflating confidence in ways that are harder to detect than hallucinations. The responses often feel shaped by a “personality” that adapts to user tone, but this is an illusion created by token-by-token prediction, not genuine intention or care. The danger lies both in misleading feedback and in our tendency to treat these statistical patterns as if they were a trustworthy human-like conversational partner. --- ## How to Help AI Get Things Right AI excels at certain tasks and stumbles at others. The goal is not to make it perfect but to use it where it’s strongest and avoid high-risk cases. --- ### Activity 7: NotebookLM One of [NotebookLM](https://notebooklm.google.com/)'s strengths is its ability to use RAG to anchor its responses directly to the sources you provide. This grounding helps ensure that generated information can be traced back to its original context, making NotebookLM a reliable tool for research, teaching, and learning. Additionally, NotebookLM’s capacity to transform the same content into different modes of representation (e.g. summaries, podcasts, mindmaps)mirrors what multimodal learning research demonstrates: knowledge is more likely to be retained when engaged with through multiple channels and perspectives. **Try this with a source of your choice (article, lecture notes, or even one of your own writings).** 1. Upload a document into NotebookLM (e.g., a short story or news article). 2. In the *Sources* view, skim the auto-generated summary and key topics. 3. In *Chat*, ask NotebookLM to produce: - a bullet-point summary for study notes - a FAQ highlighting common student questions - a timeline of key events (if historical or sequential) 4. In *Studio*, generate a podcast-style conversation about the document. #### Reflection * How did the different modes (summary, FAQ, podcast) highlight different aspects of the same material? * What surprised you about how the AI reframed the content? * How might this approach be useful (or limited) in your own research, study, or professional work? #### Takeaway NotebookLM demonstrates how multimodal learning can enhance comprehension and retention. By shifting the lens—from outline to Q&A to dialogue—you engage with the same content in multiple ways, reinforcing knowledge and uncovering new insights. Drawing on Retrieval-Augmented Generation (RAG), NotebookLM also ensures that each form of engagement is firmly grounded in your original sources, supporting accuracy and deepening your understanding as you explore information from different angles. --- ### Activity 8: Target Appropriate Scenarios The following use cases target tasks that AI is relatively reliable at, or constrain it to providing information that can aid the teaching and learning process even if there are a few errors in its output. Look at the following list with a partner and try out one or two of these activities. * organize existing data: * reformat a messy spreadsheet * refactor a complex codeblock * translate from one language, format or medium to another * check the translation you are reading in a Gen Ed course to see if your close reading matches the nuances of the original language * summarize a long academic paper. * Generate a short podcast script explaining that paper. * "translate" from English to code ("Vibe Coding") * break down a difficult course concept into smaller chunks. * write alt-text for an image, then improve it for accessibility * turn a messy whiteboard photo into structured notes * take dictation in voice mode and return clean Markdown * offer feedback and questions rather than answers * give feedback *as if it were a student* in your course. * offer counterarguments to an academic argument for the writer to evaluate and defeat * anticipate likely misconceptions based on the text of a lecture or textbook chapter #### Reflection * Which tasks felt effortless for the AI? * Which outputs required caution or double-checking? * Where could you imagine using this in your teaching? #### Takeaway AI is most helpful when used in **bounded, low-stakes tasks**—reformatting, summarizing, translating, prototyping. Avoid over-reliance for complex reasoning or sensitive evaluative tasks where errors or biases carry higher risk. --- ### Activity 9: Assignment (re)Design A few of the highest-priority issues for instructors in the FAS this year are: 1. recentering academics 2. grade inflation 3. the role of AI in addressing #1 and #2. In this section of today's workshop, we're focused on #3, but the implications for the other two issues are baked into the mix. While the previous section focused more on how AI works, this section will focus on AI as an opportunity—and requirement—to think more closely about how we design assignments so that they can provide reliable evidence of student learning. #### Elements to Keep in Mind When Designing Assignments in the Age of AI 1. **Purpose**. Make sure that the purpose of a given assignment is clear. How is it building on previous work in the course and building toward other assignments, e.g., higher-stakes and/or more authentic ones? How does it connect with students’ goals in the class and beyond? *This sense of purpose can build more intrinsic motivation.* 2. **Build in process.** Giving students the chance to practice smaller steps of the assignment—with guidance and models—allows for lower-stakes feedback and more chances to see where students are at. *Process-oriented approaches remove the “black box” effect of assignments, where we assign something and then students just turn in a product that is supposed to count as evidence of their learning.* 3. **Have clear expectations**. Goals, policies, and requirements that are aligned (and ideally reflected in a rubric) with your course’s AI policy and the assignment prompt at hand will make the experience of doing (and assessing) the assignment more transparent and intentional. *If you have a “no AI” policy, be sure students know what that means in practice; if you are incorporating AI into an assignment, make sure students know why it’s being used and how the use will be assessed.* 4. **Incorporate metacognition**. Having students reflect on their successes, struggles, choices, etc. throughout an assignment can be useful for both students and instructors. *Journal entries or cover letters not only reinforce #1–3 above, they keep the relationship between students and their learning at the center of the assignment.* #### 1. AI-Resilient Design These adjustments reduce the chance that AI can produce fluent but unreliable submissions, while staying aligned with the assignment’s learning goals: ##### Connect to course-specific materials * Require explicit use of lecture content, not just readings. * Ask students to incorporate "local data" produced in the course or not available in the LLM training (i.e. discussion notes, a guest speaker’s remarks, recent case studies, objects/subjects/texts from Harvard collections unavailable to the public. * Students must provide page citations or quotations, which prevents AI-generated generic summaries. ##### Add in-person or no-device checkpoints * Require an in-class outline or oral proposal where students present their thesis and sources before drafting. * After submission, include a short oral defense or reflective writing exercise (10–15 minutes in section) where students explain their argument without notes. ##### Add a process reflection piece * Ask for a written memo or in-person checkin where they describe: how they developed their thesis, what counterarguments they considered, and which readings were most useful. The same can be done for more complex code. #### 2. AI-Enhanced Design If your course allows AI, you can adapt the assignment to use it as a learning tool rather than a shortcut: ##### Transparent AI workflow log * Require students to submit a short AI log (screenshots or structured handout). * Logs can note what was helpful, what they rejected, and why. ##### Fact-checking task * Students may generate AI-produced draft paragraphs or code but must revise and fact-check them using course sources. * Their submission can include both versions (AI and revised) plus a rationale for the revision and the methods + results of their fact-checking steps. ##### Pair AI prep with no-AI verification * Let students use AI for early generation, but require complementary short in-class written exercise (e.g., summarize their thesis or annotate/edit code in 15 minutes, no devices). --- ### Stay connected! Feel free to reach out at generativeAI@fas.harvard.edu for any AI-related inquiries. We also have a weekly Friday morning AI Lab (9:00-10:30am, Pierce Hall 100F). This is a coffee chat for faculty and grad students focused on AI news and discussion. We’ll share notable developments and open conversation about AI’s impact on teaching, learning, and research. A calendar of events and workshops can be found [here](https://bokcenter.harvard.edu/generative-ai-events). --- ### Help us work on 101 more! - Drafting quiz questions aligned to Bloom’s taxonomy levels - Generating sample student answers at varying quality levels for grading practice - Turning lecture transcripts into study flashcards - Suggesting analogies for abstract concepts - Creating scaffolding questions for complex readings - Checking logical flow in a student essay - Simplifying statistical results for undergraduates - Writing short case studies from longer readings - Suggesting Socratic-style discussion prompts - Creating hypothetical exam questions based on lecture notes - Designing practice problems with incremental difficulty - Transforming a lecture into a role-play script for classroom performance - Mapping out prerequisite knowledge for a lesson - Designing debate positions for classroom exercises - Rewriting a dense paragraph in three different reading levels - Generating a mock peer review with disciplinary jargon - Offering multiple-choice distractors tailored to common errors - Proposing in-class activities to reinforce a theory - Drafting “exit ticket” reflection prompts for the end of class - Reformatting assignment instructions for clarity - Turning bullet points into narrative lecture text - Creating mnemonics for memorization-heavy topics - Generating an FAQ section from course policies - Producing a glossary from a set of articles - Modeling argument structures in outline form - Suggesting supplementary readings keyed to course themes - Building guided practice questions around a worked example - Drafting student-facing rubrics with plain-language descriptors - Breaking down a complex image into labeled segments - Generating comprehension checks for each section of a text - Identifying interdisciplinary connections across two readings - Creating scaffolds for group projects (roles, steps, deadlines) - Drafting quick-write prompts keyed to specific lecture slides - Suggesting revisions for clarity and concision in drafts - Creating starter code templates for non-coding assignments (e.g., LaTeX) - Proposing ethical dilemmas connected to scientific concepts - Generating revision schedules before exams - Offering alternative perspectives to broaden interpretation of a text - Creating example datasets for practice analysis - Suggesting interactive poll questions for class sessions - Drafting study guides that highlight “likely test topics” - Creating checklists for lab procedures from long manuals - Identifying unstated assumptions in a student argument - Building structured “choose your own adventure” learning scenarios - Transforming classroom transcripts into instructor reflection notes - Providing time estimates for student assignments - Designing scaffolded outlines for research papers - Creating hypothetical interview questions for a historical figure - Suggesting incremental checkpoints in project-based courses - Converting lecture diagrams into text descriptions for accessibility - Generating warm-up questions that activate prior knowledge - Suggesting alternative metaphors for key theories - Drafting sample letters to policymakers based on policy analysis work - Breaking down a primary source document into guiding questions - Designing a simulation script with multiple branching outcomes - Generating comprehension checks embedded into slide decks - Creating student reflection prompts linked to course outcomes - Proposing variations of a lab experiment for inquiry-based learning - Creating model answers at different grade levels for calibration - Suggesting journal prompts that connect personal experience to course content - Reframing assignment instructions for English-language learners - Generating templates for field notes - Producing visual summaries (tables, charts) from text - Designing prompts for interdisciplinary connections (history + science, etc.) - Creating practice critiques of scholarly methods sections - Suggesting small-group breakout tasks for Zoom sessions - Drafting reflection questions tied to professional skills - Creating comparison tables for competing theories - Generating assessment questions that require synthesis across weeks - Designing review games (Jeopardy-style, Kahoot-style) with auto-generated Q&A - Recasting a lecture as a dialogue between two experts - Creating incremental hints for problem-solving tasks - Drafting outlines for micro-lectures based on reading gaps - Offering counterexamples to reinforce conceptual boundaries - Suggesting fieldwork observation checklists - Transforming a syllabus into a week-by-week action planner - Generating model interview transcripts for qualitative research practice - Creating practice datasets with intentional flaws for error-spotting - Designing “what if” scenario questions to explore alternate outcomes - Drafting closing synthesis prompts for end-of-unit discussions