multi-LLM-coding-activity

# multi-LLM-coding-activity ## Analytical Steps Using LLMs for Qualitative Coding and Analysis **Learning Goal**: Practice using AI tools to assist with qualitative coding and see how multiple models can complement one another. Qualitative research often involves coding interview transcripts. Traditionally, this requires researchers to read and mark up transcripts manually. In this activity, you’ll see how large language models (LLMs) can speed up the process, compare results across different models, and still rely on human judgment for final decisions. Here is an example of using LLMs for processing interview transcript from a [dissertation](https://www.proquest.com/docview/3240875025?pq-origsite=gscholar&fromopenview=true&sourcetype=Dissertations%20&%20Theses): ### Step 1: Initial Coding by Two Independent LLMs Upload the same transcript into two different LLMs (e.g., ChatGPT-4.5 and Google Gemini 2.5 Pro). Give each model the same coding framework (categories such as teaching philosophy, professional values, ethical concerns). See how each model generates codes on its own. Notice how wording and detail might differ between the two models. ### Step 2: Consistency Check Using a Third LLM Take both sets of codes (from Step 1) and feed them into a third model (e.g., ChatGPT-4o). Ask this model to: * Identify agreements between the two coders. * Highlight disagreements or differences. * Summarize overall consistency. Practice how to evaluate inter-coder reliability when using AI. Think about where do the models strongly agree? Where do they diverge? Why might that happen? ### Step 3: Human Reconciliation and Final Coding Decisions Review the consistency report yourself. Decide whether to keep one model’s interpretation, combine them, or create a new code that better fits the transcript. Learn that AI support is useful, but human expertise is still necessary for accurate and meaningful coding. How does your judgment add value beyond what the AI produced. ### Example of the Coding Process The research project explores how teachers perceive AI in education, focusing on its impact on professional identity, values, and pedagogical autonomy. The table below demonstrated examples of the coding process: ![image](https://hackmd.io/_uploads/r1pl2Byixx.png) ### Example of Consistency Check The coding results from the two documents — one generated by Gemini 2.5 Pro and the other by GPT-4.5 — are generally consistent in terms of high-level themes, but there are notable differences in depth, structure, and interpretation. Here's a comparative analysis: ________________________________________ ✅ Areas of Consistency 1. Major Code Categories: Both coders identified similar high-level themes such as: ○ Teaching Role & Context ○ Teaching Philosophy ○ Aspirations for Students ○ AI Usage Behavior ○ Attitudes Toward AI ○ Ethical Concerns ○ Policy and Governance ○ Training and Support ○ Student Misuse of AI 2. Core Interpretive Alignment: ○ Both highlighted the teacher’s cautious openness to AI. ○ Both acknowledged ethical dilemmas, especially the “hypocrisy” concern. ○ Both recognized institutional gaps in training and policy. ○ Both captured the teacher’s emphasis on civic engagement and critical thinking. ![image](https://hackmd.io/_uploads/Hyc86SJile.png) 🧠 Implications for Inter-Coder Reliability ● Substantive Agreement: High. Both coders agree on the major themes and how the participant thinks about AI. ● Interpretive & Structural Agreement: Moderate. Coding categories are compatible but not standardized, which affects comparison precision. ● Unitization Differences: GPT-4.5 sometimes codes longer segments as a unit, while Gemini 2.5 breaks down into smaller sub-themes — impacting how many codes a passage gets tagged with.