# postdoc-coding-projects
## Qualitative analysis
* [Coding and counting](https://colab.research.google.com/drive/1BOuSwJIt31G2ZJvwQ9l_XYG99KTP0juR?usp=sharing)
* This Colab notebook analyzes YouTube comments using video context information provided alongside a CSV file of comments. The notebook randomly selects two samples of comments: the first is processed with GPT to generate an initial set of themes, and the second is used to refine those themes. After finalizing the themes, GPT is used to assign a theme to each comment in the dataset. The distribution of themes across all comments is then visualized as a pie chart, providing an overview of prevalent topics or sentiments within the video's comment section.
* [Coding with multiple LLMs](https://colab.research.google.com/drive/1GivXCE0xnr_vbnzkFOhxB7rEcv3sCj7O?usp=sharing)
* This Colab notebook demonstrates how large language models (LLMs) can be used to analyze coding interview transcripts and how multiple models can complement one another throughout the process. In Task 1, excerpts are extracted from the transcripts by two different models, gpt-4o and gpt-4.1, while a third model, gpt-5, compares the excerpts produced by the first two models to evaluate their similarities and differences. Task 2 builds on this by applying existing coding schemes to the extracted excerpts using the same two initial models for coding, and again utilizes gpt-5 to compare the codes and excerpts generated. In Task 3, the notebook explores auto-generating codes and applying thematic labels with LLMs; both gpt-4o and gpt-4.1 are used to extract and code the excerpts, and gpt-5 compares the results. Through these steps, the notebook illustrates the potential for collaboration between different LLMs to enhance qualitative transcript analysis.
---
## Concordancer
* [Bible concordancer](https://colab.research.google.com/drive/1n_26uMBc-4H9Ij_dYZCHaiUbWabt6-EF?usp=sharing)
* This Python notebook uses NLTK to search all books of the Bible for passages containing a specified keyword, matching on lemma form (i.e. keyword "go" matches with "goes," "went," etc.) and extracting a 50-word context window. It selects two random samples of 30 excerpts each: the first is processed by GPT to generate candidate themes, and the second to refine these themes. The resulting themes are then assigned to all matching excerpts using GPT. Finally, the distribution of themes is visualized across the biblical books in a scatterplot, offering insights into the thematic landscape of the Bible around the chosen keyword.
* [Short story generator](https://github.com/bok-learning-lab/ai-lab-halloween-2025/tree/jy/concordancer-stories)
* This project generates short stories in the style of classic gothic literature using Node.js and Python. Users can select a source text—Dracula, Frankenstein, or Jekyll and Hyde—and enter a keyword. The system uses NLTK lemmatization in Python to find all forms of the keyword in the chosen novel and collects relevant sentences. These excerpts are then processed by GPT-4o-mini to produce a ~250-word atmospheric story. The interface shows a split-screen comparison of the original passages and the generated story.
---
## Oral exams
* [Text segmentation](https://colab.research.google.com/drive/1KY8zIbdV7k9zSOsI7el6cRVKgTlaRpth?usp=sharing)
* This Colab notebook processes oral exam transcripts obtained from Airtable to improve readability. It segments each transcript into sections based on transition phrases such as "now turning to," "in terms of [keyword]," and "so for [keyword]." After segmenting, the notebook iterates through both the original and segmented transcripts to verify that no text content was altered during processing
* [Text extraction from note images](https://colab.research.google.com/drive/16rXK85HWM0IdMFn9Fs1bv_UoO1DyZ_H_?usp=sharing)
* This Colab notebook automates the processing of students’ handwritten notes stored in Airtable. It retrieves each image attachment, applies an OpenAI Vision model to extract and structure the handwriting into clean, hierarchical markdown, and enriches recognition accuracy using concept context pulled from a companion Airtable table. The notebook sends the cleaned text back into Airtable and includes helpful features like processing many notes at once, avoiding repeated downloads, checking that all the connections work, and offering simple tools to check table status or process a single record when needed.
---
## Translation
* [Translation temperature](https://colab.research.google.com/drive/1UH3tSPtuMnMOPaEeDjIyoKLHB9S-mJZ6?usp=sharing)
* This Colab notebook shows how different GPT temperature settings affect translation output. It translates a poem into English six times—two translations at each of three temperature values. For each temperature, the notebook displays the differences between the two translations side by side. In the final step, GPT analyzes both translations from each temperature and summarizes their differences.
* [Poem translation and generation](https://colab.research.google.com/drive/15INMpqpKFMGIkr4SB8vWs1-WxBKD18ES?usp=sharing)
* This Colab notebook analyzes five classic poems by Li Bai, a renowned ancient Chinese poet, using a structured-output model to extract imagery, poetic techniques, emotional themes, and stylistic features. It then uses the analysis to generate an original Halloween-themed poem inspired by Li Bai’s style. Finally, the notebook converts the generated poem into audio using ElevenLabs, allowing users to listen to the poem in a selected voice.
---
## Exam question generation
* [Quiz data extraction](https://colab.research.google.com/drive/1PI77HUw2xegsja-jlKcJxV8RFV9GTrUB?usp=sharing) and [new question generation](https://colab.research.google.com/drive/1yVQlbbYOuDbqHeELmZgHuUAE4i5TOSn5?usp=sharing) for Astro 17
* These Colab notebooks are for converting different types of quiz questions (AI required, optional, and prohibited) into questions for a no-AI sit-down exam. The first notebook processes quiz files and sorts each question by whether it requires, allows, or prohibits AI assistance. The second notebook generates new questions on the same subject matter as a given assignment, but ensures they are suitable for sit-down exams without AI support.