# ftw-ai-tools-for-language-instructors [slides here](https://docs.google.com/presentation/d/1q3gZrLXm4seLs5_5_tvr5j5-P43bgM3mpenaE9p6Me8/edit?slide=id.g364ed0e9f78_0_0#slide=id.g364ed0e9f78_0_0) ## Outline * Intro: what's the lay of the land in 2025-26? * AI literacy: how does AI work and what does it get wrong? * AI in the language classroom: from lesson prep to practice --- ## Introduction * How are your students using AI? * Some statistics from [Harvard](https://arxiv.org/pdf/2406.00833) and [across](https://www.chronicle.com/article/how-are-students-really-using-ai) the [country](https://www.grammarly.com/blog/company/student-ai-adoption-and-readiness/) * A July 2025 Grammarly study of 2,000 US college students found that 87% use AI for schoolwork and 90% for everyday life tasks. * Students most often turn to AI for brainstorming ideas, checking grammar and spelling, and making sense of difficult concepts. * While adoption is high, 55% feel they lack proper guidance, and most believe that learning to use AI responsibly is essential to their future careers. * Discussion: how are you using it? how are people in your department using it? do you know your course's policy? * What is the landscape this year? * Here are the currently [recommended Harvard course policies](https://oue.fas.harvard.edu/faculty-resources/generative-ai-guidance/) from the Office of Undergraduate Education * Here is [the advice the Bok Center is providing your Faculty](https://bokcenter.harvard.edu/artificial-intelligence) * There are two AI tools that HUIT is supporting. Let's get you connected to them before we move on with the workshop! * here is your link to [Google Gemini](https://gemini.google.com/app) * and here is your link to [the HUIT AI Sandbox](https://sandbox.ai.huit.harvard.edu/) * **Important privacy note:** These HUIT-supported tools have built-in privacy safeguards. Harvard has contracts with these providers ensuring that anything you share won't be used to train their models or be shared with third parties. These tools are safe for Level 3 data, which includes course materials and student work. This means you can confidently use them for teaching activities without worrying about privacy violations. --- ## AI Literacy: How does AI work? Using AI responsibly starts with AI literacy. This means moving beyond what AI can do and exploring how it works and when it fails. In this section, we’ll focus on two key aspects of how AI functions: - **AI as a Statistical Machine**: LLMs process language as numbers rather than understanding it, leading to predictable errors that users can learn to anticipate and correct. - **AI as a Reflection of its Training Data**: AI models learn from vast amounts of human-generated data, absorbing and amplifying the stereotypes within it. --- ### Activity 1: Tokenization Paste the text below into [tiktokenizer](https://tiktokenizer.vercel.app/). ``` Unsurprisingly, they had to cancel the show. The crowd went home unhappily. ``` * Notice how the model breaks words into tokens. * Try putting in a sentence or a word with complex morphology in your language of choice * Discuss: What does this reveal about how AI “reads” text differently from humans? #### Takeaway AI doesn’t “read” words like humans do. It breaks text into tokens—numbers representing pieces of words. This shows that LLMs process language as math, predicting the next number in a sequence rather than reasoning about meaning. --- ### Activity 2: Multiplication- Predicting vs. "Reasoning" **1. Prompt (for Gemini Flash or Llama 3.2 11b oro older model):** ``` 82,345 × 67,890. give me an immediate response without using code. ``` * Try it yourself first → you’ll see it’s hard to do “in your head.” * See how the AI answers. * Does it get it right? If it's wrong, is it *completely* wrong or close? how? **2. Prompt (for Gemini Flash Thinking or GPT-4.1 Reasoning):** ``` 82,345 × 67,890 ``` * Compare this to when you asked for an “immediate response." * Does the model get the math right this time? * What’s different about the *style* of its response? #### Takeaway AI doesn’t actually *calculate*—it predicts the next token (number) based on patterns in training data. That’s why answers can be *fact-like* or “almost correct,” but still wrong: they’re based on statistical averages of the internet, not reasoning. AI tools increasingly offer **“thinking” modes** (sometimes called *chain-of-thought* or *reasoning* models). Reasoning models still predict, but showing their work lets you spot errors and better trust or question their output. Asking the model to “think step by step” can improve reliability and helps you check its work. ___ ### Activity 3: Ethical Error — Gender Bias in AI Generated Text Large language models (LLMs) are trained to predict the next token in a sequence, not to reason about the world. When faced with tasks like resolving ambiguity in coreference, they often lean on statistical associations from training data rather than logical inference. This can result in bias amplification, where the model defaults to stereotypical mappings of roles and genders. **Prompt:** ``` A pilot and a flight attendant on a flight to LA got into a minor argument about what the weather would be like when they got there. She thought it would be sunny, but he disagreed. Who thought it would be sunny? ``` * What answer do you get? * Try changing “she” to “he” and notice what happens. **Prompt:** ``` Translate "the doctor and the nurse got married because she was pregnant" into Italian ``` * What stereotypes or cultural assumptions is the AI relying on? * How could this show up in student work? Coreference resolution is a particularly sensitive case: models decide which entity a pronoun refers to based on probability distributions learned from data. Because occupational roles are unevenly represented in training corpora, the model can produce systemically skewed results. [More examples here →](/T6ENtzTvSqKfwL7wqVQjww) #### Takeaway: When resolving pronominal ambiguity, AI leans on statistical associations (e.g., “doctor = he,” “nurse = she”) instead of logic. This reveals how uneven data can amplify stereotypes and distort interpretation in ways that shape outputs. --- ### Activity 4: Ethical Error — Bias in Images Image generation models (like Gemini, DALL·E, or Midjourney) work by sampling from high-dimensional probability distributions conditioned on a prompt. The outputs reflect the distribution of their training data, which is often dominated by certain demographics or cultural defaults. As a result, even seemingly neutral prompts (e.g. “a happy family”) are resolved in highly regularized ways that reproduce these statistical biases. **Prompt an image generator:** ``` Create an image of a happy family ``` or ``` Create an image of a happy couple ``` * Compare the outputs to those sitting next to you--what patterns do you see? What kinds of people or relationships appear most often? * What patterns or omissions do you see? What’s the risk of using these images in class slides for instance? [See more examples →](/pvNaRf56T7qhOqx1GUlcrA) **Try the same prompt again, in your language of choice.** * Before generating images: * What do you predict will be the differences in the images generated by the different languages? * After generating the images: * What similarities or differences do you notice across the images? #### Takeaway Generative image models do not “choose” representations; they maximize likelihood based on patterns in skewed datasets. Because outputs are constrained by frequency and representation in the training data rather than a balanced sampling of real-world diversity, prompts like “a happy family” often yield stereotypical demographics, normalizing omissions and reinforcing cultural defaults. --- ### Activity 5: Language Representation in AI Large language models (LLMs) learn from vast amounts of text collected from the internet. But not all languages are represented equally online. Some languages with millions of speakers have relatively little digital content, while others with fewer speakers dominate the web. This imbalance shapes how well AI can generate useful materials for teachers and learners of different languages. **Click through the visualization to compare:** - The proportion of content available in each language - The number of people worldwide who speak it. <iframe src="https://claude.site/public/artifacts/2b9d7a13-c0de-4433-b13a-479f0aa81006/embed" title="Claude Artifact" width="100%" height="600" frameborder="0" allow="clipboard-write" allowfullscreen></iframe> **Discuss**: * Which languages appear “overrepresented” (lots of internet data relative to speakers)? Which are “underrepresented”? * How might these imbalances affect teachers working in less digitally-represented languages? #### Takeaway: AI outputs tend to be stronger for languages with abundant digital data and weaker for those with less online presence. This gap reflects broader inequities in the accessibility of digital tools and highlights the importance of teacher review, adaptation, and creativity—especially when working with underrepresented languages. --- ## AI in the Language Classroom AI is flawed, but it can also be *incredibly useful* if we learn how to guide it. In this section, we’ll explore strategies for using AI to streamline preparation and expand what’s possible in the classroom. --- ### Activity 6: Prompt and Context Engineering The usefulness of AI for teaching depends heavily on how you ask and what context you provide. In this activity, you’ll practice adapting a text (e.g., a news article, short story, etc.) for learners at a different proficiency level. You can try this with [this sample text (in English)](https://docs.google.com/document/d/1khDaObTb1moZrg90ABpSZhl5RDpK6gKC0WZu3h4U33g/edit?usp=sharing), or a text of your choice. 1. Ask without much detail: ``` Simplify this article for intermediate ESL learners. ``` 2. Then ask with **role, audience, format, content** specified: ``` Act as a language teacher for a B1 level ESL class for college students. Simplify the following article (around 350 words), keeping the main ideas but using vocabulary and grammar suitable for this level. Provide a short comprehension check with 3 questions. ``` 3. Finally, provide **supporting context**: For example, you can give it more information on learner profiles, lesson objectives, target vocabulary, etc. ``` The goal of the lesson is for students to (1) learn and use new vocabulary related to food and health, and (2) practice using verbs to express (un)certainty. After reading the text, students will participate in a debate in which they discuss and defend their positions on the issue. ``` #### Reflection * How did the output improve as you gave more structure and context? * How could you apply this strategy to other class materials or learning activities? #### Takeaway Good prompts are like good lesson instructions: clearly defining the role, audience, and learning objectives helps AI generate outputs that are useful for your students. Adding context—like the original text, student profiles, or targeted grammar/vocabulary—makes the results even more effective. --- ### Activity 7: NotebookLM One of [NotebookLM's](https://notebooklm.google.com/) strengths is its ability to transform the same source text into different formats for teaching and learning. This mirrors a core principle of language acquisition: knowledge is strengthened when students engage with new vocabulary and grammar through varied channels like reading, writing, and listening. **Try this with a short text (e.g. news article, short story) in the language you teach:** 1. Upload a document into a new NotebookLM notebook. 2. In the Sources view, quickly skim the auto-generated summary and key topics to ensure NotebookLM has grasped the main points. 3. In the Chat box, ask NotebookLM to generate specific, class-ready materials from the source: - A list of 5 reading comprehension questions in the target language. - A vocabulary table with three columns: the word (in the target language), its English definition, and an example sentence from the article. - A fill-in-the-blank grammar exercise that targets a specific tense or concept found in the text. 4. You can also create podcasts in your target language by going to **Settings** and changing the **Output Language**. ``` Generate a podcast for an intermediate French learner(CEFR level B2) to help them understand the text. ``` #### Reflection - How could these different formats (comprehension questions, vocabulary table, grammar quiz) support different students in your class? - Which format would be most effective for a homework assignment versus an in-class interactive activity? --- ### Activity 8: Generating New Materials So far, we've discussed how AI works, how to make AI output better, and what different tools can do. In this activity, you'll make use of this new knowledge to experiment with generating materials for a class you might teach. 1. Pick a short text appropriate for your teaching context and paste it in your chosen AI tool 2. Ask the AI to generate a **45-minute lesson plan** and teaching materials based on your text for: - **Reading** (e.g., comprehension questions, main idea tasks) - **Listening** (e.g., have the AI transform the text into an audio script or dialogue, then create listening comprehension tasks) - **Speaking** (e.g., discussion questions, role-plays, debate prompts) - **Writing** (e.g., writing tasks that connect to the text topic or style) Feel free to play around with prompt engineering, to see how adding information or rephrasing your prompt can change what the AI generates. Once you've generated and reviewed your materials, share them with your group and discuss: - Which activities did the AI do well? Were there any gaps or errors? - Would you use all or part of its output? What would you change before teaching? --- ### Help us work on 101 more! If you're interested in developing more innovative ways of using AI, please reach out! Contact us at learninglab@fas.harvard.edu