# rll-ai-tools-for-language-instructors [slides here](https://docs.google.com/presentation/d/1nzFAgvBH7iF63QfMfbDZxilhwjIb-Kou46U9ng2qOYg/edit?slide=id.g364ee951e0f_0_19#slide=id.g364ee951e0f_0_19) ## Outline * Intro: what's the lay of the land in 2025-26? * AI literacy: how does AI work and what does it get wrong? * AI in the language classroom: from lesson prep to practice ## Introduction * How are your students using AI? * Some statistics from [Harvard](https://arxiv.org/pdf/2406.00833) and [across](https://www.chronicle.com/article/how-are-students-really-using-ai) the [country](https://www.grammarly.com/blog/company/student-ai-adoption-and-readiness/) * A July 2025 Grammarly study of 2,000 US college students found that 87% use AI for schoolwork and 90% for everyday life tasks. * Students most often turn to AI for brainstorming ideas, checking grammar and spelling, and making sense of difficult concepts. * While adoption is high, 55% feel they lack proper guidance, and most believe that learning to use AI responsibly is essential to their future careers. * Discussion: how are you using it? * What is the landscape this year? * Here are the currently [recommended Harvard course policies](https://oue.fas.harvard.edu/faculty-resources/generative-ai-guidance/) from the Office of Undergraduate Education * Here is [the advice the Bok Center is providing your Faculty](https://bokcenter.harvard.edu/artificial-intelligence) * There are two AI tools that HUIT is supporting. Let's get you connected to them before we move on with the workshop! * here is your link to [Google Gemini](https://gemini.google.com/app) * and here is your link to [the HUIT AI Sandbox](https://sandbox.ai.huit.harvard.edu/) * **Important privacy note:** These HUIT-supported tools have built-in privacy safeguards. Harvard has contracts with these providers ensuring that anything you share won't be used to train their models or be shared with third parties. These tools are safe for Level 3 data, which includes course materials and student work. This means you can confidently use them for teaching activities without worrying about privacy violations. --- ## AI Literacy: How does AI work? Using AI responsibly starts with AI literacy. This means moving beyond what AI can do and exploring how it works and why it fails. In this section, we’ll focus on two key aspects of how AI functions: - **AI as a Statistical Machine**: LLMs process language as numbers rather than understanding it, leading to predictable errors that users can learn to anticipate and correct. - **AI as a Reflection of its Training Data**: AI models learn from vast amounts of human-generated data, absorbing and amplifying the stereotypes within it. --- ### Activity 1: Tokenization Paste the text below into [tiktokenizer](https://tiktokenizer.vercel.app/). ``` Unsurprisingly, they had to cancel the show. The crowd went home unhappily. ``` * Notice how the model breaks words into tokens. * Try putting in a sentence or a word with complex morphology in your language of choice * Discuss: What does this reveal about how AI “reads” text differently from humans? #### Takeaway AI doesn’t “read” words like humans do. It breaks text into tokens—numbers representing pieces of words. This shows that LLMs process language as math, predicting the next number in a sequence rather than reasoning about meaning. --- ### Activity 2: Multiplication- Predicting vs. "Reasoning" **1. Prompt (for Gemini Flash or Llama 3.2 11b oro older model):** ``` 82,345 × 67,890. give me an immediate response without using code. ``` * Try it yourself first → you’ll see it’s hard to do “in your head.” * See how the AI answers. * Does it get it right? If it's wrong, is it *completely* wrong or close? how? **2. Prompt (for Gemini Flash Thinking or GPT-4.1 Reasoning):** ``` 82,345 × 67,890 ``` * Compare this to when you asked for an “immediate response”. * Does the model get the math right this time? * What’s different about the *style* of its response? #### Takeaway AI doesn’t actually *calculate*—it predicts the next token (number) based on patterns in training data. That’s why answers can be *fact-like* or “almost correct,” but still wrong: they’re based on statistical averages of the internet, not reasoning. AI tools increasingly offer **“thinking” modes** (sometimes called *chain-of-thought* or *reasoning* models). Reasoning models still predict, but showing their work lets you spot errors and better trust or question their output. Asking the model to “think step by step” can improve reliability and helps you check its work. ___ ### Activity 3: Ethical Error — Gender Bias in AI Generated Text Large language models (LLMs) are trained to predict the next token in a sequence, not to reason about the world. When faced with tasks like resolving ambiguity in coreference, they often lean on statistical associations from training data rather than logical inference. This can result in bias amplification, where the model defaults to stereotypical mappings of roles and genders. **Prompt:** ``` A pilot and a flight attendant on a flight to LA got into a minor argument about what the weather would be like when they got there. She thought it would be sunny, but he disagreed. Who thought it would be sunny? ``` * What answer do you get? * Try changing “she” to “he” and notice what happens. [More examples here →](/T6ENtzTvSqKfwL7wqVQjww) **Prompt:** ``` Translate "the doctor and the nurse got married because she was pregnant" into Italian ``` * Discuss: What stereotypes or cultural assumptions is the AI relying on? How could this show up in student work? Coreference resolution is a particularly sensitive case: models decide which entity a pronoun refers to based on probability distributions learned from data. Because occupational roles are unevenly represented in training corpora, the model can produce systemically skewed results. #### Takeaway: When resolving pronominal ambiguity, AI leans on statistical associations (e.g., “doctor = he,” “nurse = she”) instead of logic. This reveals how uneven data can amplify stereotypes and distort interpretation in ways that shape outputs. --- ### Activity 4: Ethical Error — Bias in Images Image generation models (like Gemini, DALL·E, or Midjourney) work by sampling from high-dimensional probability distributions conditioned on a prompt. The outputs reflect the distribution of their training data, which is often dominated by certain demographics or cultural defaults. As a result, even seemingly neutral prompts (e.g. “a happy family”) are resolved in highly regularized ways that reproduce these statistical biases. **Prompt an image generator:** ``` Create an image of a happy family ``` or ``` Create an image of a happy couple ``` * Compare the outputs to those sitting next to you--what patterns do you see? What kinds of people or relationships appear most often? * What patterns or omissions do you see? What’s the risk of using these images in class slides for instance? [More examples here →](/pvNaRf56T7qhOqx1GUlcrA) **Try the same prompt again, in your language of choice.** * Before generating images: * What do you predict will be the differences in the images generated by the different languages? * After generating the images: * What similarities or differences do you notice across the images? #### Takeaway Generative image models do not “choose” representations; they maximize likelihood based on patterns in skewed datasets. Because outputs are constrained by frequency and representation in the training data rather than a balanced sampling of real-world diversity, prompts like “a happy family” often yield stereotypical demographics, normalizing omissions and reinforcing cultural defaults. --- ### Activity 5: Language Representation in AI Large language models (LLMs) learn from vast amounts of text collected from the internet. But not all languages are represented equally online. Some languages with millions of speakers have relatively little digital content, while others with fewer speakers dominate the web. This imbalance shapes how well AI can generate useful materials for teachers and learners of different languages. **Click through the visualization to compare:** - The proportion of content available in each language - The number of people worldwide who speak it. <iframe src="https://claude.site/public/artifacts/2b9d7a13-c0de-4433-b13a-479f0aa81006/embed" title="Claude Artifact" width="100%" height="600" frameborder="0" allow="clipboard-write" allowfullscreen></iframe> **Discuss**: * Which languages appear “overrepresented” (lots of internet data relative to speakers)? Which are “underrepresented”? * How might these imbalances affect teachers working in less digitally-represented languages? #### Takeaway: AI outputs tend to be stronger for languages with abundant digital data and weaker for those with less online presence. This gap reflects broader inequities in the accessibility of digital tools and highlights the importance of teacher review, adaptation, and creativity—especially when working with underrepresented languages. --- ## AI in the Language Classroom AI is flawed, but it can also be *incredibly useful* if we learn how to guide it. In this section, we’ll explore strategies for using AI to streamline preparation and expand what’s possible in the classroom. --- ### Activity 6: NotebookLM One of [NotebookLM](https://notebooklm.google.com/)’s strengths is its ability to transform the same source text into different formats for teaching and learning. This mirrors a core principle of language acquisition: knowledge is strengthened when students engage with new vocabulary and grammar through varied channels like reading, writing, and listening. **Try this with a short text (e.g. news article, short story) in the language you teach:** 1. Upload a document into a new NotebookLM notebook. 2. In the Sources view, quickly skim the auto-generated summary and key topics to ensure NotebookLM has grasped the main points. 3. In the Chat box, ask NotebookLM to generate specific, class-ready materials from the source: - A list of 5 reading comprehension questions in the target language. - A vocabulary table with three columns: the word (in the target language), its English definition, and an example sentence from the article. - A fill-in-the-blank grammar exercise that targets a specific tense or concept found in the text. 4. You can also create podcasts in your target language by going to Settings and changing the output language. In Studio, you can enter a custom prompt for an audio overview: ``` Generate a podcast for an intermediate French learner(CEFR level B2) to help them understand the text. ``` #### Reflection - How could these different formats (comprehension questions, vocabulary table, grammar quiz) support different students in your class? - Which format would be most effective for a homework assignment versus an in-class interactive activity? - How could this process allow you to create a complete, multi-faceted lesson plan from a single, authentic text? #### Takeaway NotebookLM demonstrates how multimodal learning can be put into practice to enhance language acquisition. By shifting the lens—from a simple text to a vocabulary list, a grammar quiz, and a listening exercise—you can create a rich lesson that reinforces understanding from multiple angles, saving you valuable prep time. --- ### Activity 7: Generate New Materials AI tools like ChatGPT and NotebookLM can help you generate and improve materials for teaching reading, writing, listening, and speaking. With these tools, you can create or enhance resources such as: * **Products:** * Lesson plans * Conjugation tables/exercises * Worksheets * Cards * Examples for different vocabulary levels * Generating songs, lyrics * Youtube transcripts * **Activities:** * Analyze the level of the lesson or text or song * Lesson plan evaluation/Feedback * Transitions between activities/units * Curating playlists **In this activity, we'd like you to:** 1. Get into groups based on language - [Base lessons](https://drive.google.com/drive/folders/1REtKFjDGZTC-WCA8inUQWvbztfQOLRct) 3. Remix an extant lesson plan 4. Share out #### Quick prompting tip: Ask for your edits with role, audience, format, content specified: > Act as a language teacher for a B1 level ESL class for college students. Simplify the following article (around 350 words), keeping the main ideas but using vocabulary and grammar suitable for this level. Provide a short comprehension check with 3 questions. You can also provide additional **supporting context**, e.g., more information on learner profiles, lesson objectives, target vocabulary, etc. > The goal of the lesson is for students to (1) learn and use new vocabulary related to food and health, and (2) practice using verbs to express (un)certainty. After reading the text, students will participate in a debate in which they discuss and defend their positions on the issue. --- ### Help us work on 101 more! If you're interested in developing more innovative ways of using AI, please reach out! Contact us at learninglab@fas.harvard.edu