# ai-lab-plan-20250919
Here’s a clear rundown of the links you collected this past week:
---
### 📑 Bulleted Summaries
* **[Writing effective tools for AI agents — using AI agents (Anthropic)](https://www.anthropic.com/engineering/writing-tools-for-agents)**
Explains how to design and evaluate tools for LLM agents via the Model Context Protocol (MCP). Covers prototyping, systematic evaluation, agent-collaboration in tool improvement, and principles like namespacing, returning meaningful context, token efficiency, and prompt-engineering tool descriptions.
* **[Introducing GPT-5 for developers (OpenAI)](https://openai.com/index/introducing-gpt-5-for-developers/)**
GPT-5 API release focused on coding and agentic tasks. Benchmarks show SOTA results on SWE-bench and Aider polyglot. New parameters (`verbosity`, `reasoning_effort`) and support for custom tools. Available in three sizes; integrated into Microsoft platforms.
* **[Codex and the future of coding with AI — OpenAI Podcast Ep. 6 (YouTube)](https://www.youtube.com/watch?v=OXOypK7_90c)**
Greg Brockman and Thibault Sottiaux discuss Codex’s evolution from GPT-3 sparks to GPT-5 Codex agents. Topics: harnesses for agents, latency tradeoffs, long-running coding agents, enterprise refactoring, and the future of agentic software engineers.
* **[Introducing upgrades to Codex (OpenAI)](https://openai.com/index/introducing-upgrades-to-codex/)**
Launch of **GPT-5-Codex**, optimized for software engineering. Adds long-task persistence (7+ hours), advanced code review, cloud + IDE integration, security guardrails, and MCP connections. Positioned as a reliable agentic coding teammate.
* **[Introducing gpt-realtime and Realtime API updates (OpenAI)](https://openai.com/index/introducing-gpt-realtime/)**
General availability of the Realtime API. New **gpt-realtime** speech-to-speech model with higher audio quality, instruction adherence, and function calling accuracy. Adds MCP server support, image input, SIP calling, and two new voices (Cedar, Marin).
* **[OpenAI just dropped a new model (this one is for us) – Theo, t3.gg (YouTube)](https://www.youtube.com/watch?v=j9wvCrON3XA)**
Commentary on GPT-5-Codex’s release as a model specifically for agentic coding. Framed for developer audiences, with some light critique about naming confusion.
* **[LLM as a Judge: Scaling AI Evaluation Strategies (IBM Technology, YouTube)](https://www.youtube.com/watch?v=trfUBIDeI1Y)**
Zahra Ashktorab explains “LLM-as-a-Judge” approaches for evaluation. Covers direct assessment vs pairwise comparison, tackling biases (verbosity, positional), and building scalable eval frameworks.
* **[AI-Scraping Free-for-All by OpenAI, Google, and Meta Is Over (New York Magazine)](https://nymag.com/intelligencer/article/ai-scraping-free-for-all-by-openai-google-meta-ending.html)**
Examines the crackdown on AI scraping: lawsuits, licensing deals, and new technical standards like **RSL (Really Simple Licensing)**. Cloudflare/Fastly aim to let sites block or charge for scraping. Suggests a shift toward restricted access, forcing AI firms to pay for data.
* **[How do I cite generative AI in MLA style? (MLA Style Center)](https://style.mla.org/citing-generative-ai-updated-revised/)**
Updated MLA guidance on citing AI tools. Key changes: include the model/version (e.g., GPT-4o), use shareable URLs when possible, and don’t treat the AI as an author. Examples for paraphrasing, quoting, images, and acknowledging secondary sources.
---
### 📚 Recommendations for Summarization & Printing (AI Lab)
For **AI Lab printing/discussion**, I’d recommend focusing on these:
1. **Anthropic: Writing effective tools for AI agents** → Practical and theoretical guidance on agent tooling; strong overlap with your current MCP/agentic explorations.
2. **OpenAI: GPT-5 for developers** → Benchmarks + feature details; essential reference.
3. **OpenAI: Codex upgrades (GPT-5-Codex)** → Concrete improvements in coding collaboration; great for demos and classroom analogies.
4. **OpenAI: gpt-realtime and Realtime API** → Connects to your studio projects (voice, multimodal teaching).
5. **NYMag: AI-Scraping Free-for-All ending** → Policy/cultural angle, useful for faculty discussions about ethics and access.
6. **IBM: LLM as a Judge** → Complements your work on eval frameworks; raises discussion about bias in evals.
The MLA citation update could be printed as a **faculty resource handout**, since it’s highly relevant for teaching.
---
Would you like me to prepare **1–2 page printable summaries** for each of the recommended ones (condensed + formatted), or a **longer compiled packet** that weaves them together into a single AI Lab briefing doc?
Here’s a structured bulleted list of the papers you collected, along with recommendations for which ones might be most valuable to summarize and print for your next AI Lab meeting.
---
## 📄 Papers Collected This Week
* **Large AI models are cultural and social technologies** (Science, 2025) – [DOI](https://doi.org/10.1126/science.adt9819)
* **REFRAG: Rethinking RAG based Decoding** (arXiv:2509.01092, 2025) – [PDF](https://arxiv.org/abs/2509.01092v1)
* **Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth** (arXiv:2509.03867, 2025) – [PDF](https://arxiv.org/abs/2509.03867)
* **Measuring Human Leadership Skills with Artificially Intelligent Agents** (arXiv:2508.02966, 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recd68PsD54Uh7Sg8/fldfxA7imPa4fayrx/attSQ8NbyTWCIcyW2)
* **When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs** (arXiv:2508.02994, 2025) – [PDF](https://arxiv.org/abs/2508.02994)
* **Trustworthiness of Legal Considerations for the Use of LLMs in Education** (arXiv:2508.03771, 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recCDMEIs0qtCEtkZ/fldfxA7imPa4fayrx/attFDIZm1AgY9wLso)
* **Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of AI** (Stanford Digital Economy Lab, 2025) – [PDF](https://digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine/)
* **Mitigating Hallucinations in LLMs via Causal Reasoning** (arXiv:2508.12495, 2025) – [PDF](https://arxiv.org/abs/2508.12495)
* **Artificial Analysis State of AI Q1 2025 Highlights Report** – [PDF](https://artificialanalysis.ai/)
* **Systematic Review of Key RAG Systems: Progress, Gaps, Future Directions** (arXiv:2507.18910, 2025) – [PDF](https://arxiv.org/abs/2507.18910)
* **Working with AI: Measuring the Occupational Implications of GenAI** (arXiv:2507.07935, 2025) – [PDF](https://arxiv.org/abs/2507.07935)
* **What Makes a Good Natural Language Prompt?** (arXiv:2506.06950, 2025) – [PDF](https://arxiv.org/abs/2506.06950)
* **Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation** (arXiv:2506.09046, 2025) – [PDF](https://arxiv.org/abs/2506.09046)
* **RAG+: Enhancing RAG with Application-Aware Reasoning** (arXiv:2506.11555, 2025) – [PDF](https://arxiv.org/abs/2506.11555)
* **Superintelligence Strategy: Expert Version** (arXiv:2503.05628, 2025) – [PDF](https://arxiv.org/abs/2503.05628)
* **Experience embracing genAI in an engineering computations course** (Case study, 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recwNlsIOBs1T8GbJ/fldfxA7imPa4fayrx/att0oIRoccbeoe4lF)
* **Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms** (arXiv:2503.05473, 2025) – [PDF](https://anonymous.4open.science/r/HiveLLM-5E55)
* **The ABC’s of Who Benefits from Working with AI: Ability, Beliefs, and Calibration** (NBER, 2024) – [PDF](http://www.nber.org/papers/w33021)
* **The Expertise Upheaval: How GenAI’s Impact on Learning Curves Will Reshape the Workplace** (Report, 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recXXF6wKGwXO2aPT/fldfxA7imPa4fayrx/atto4pdDyeGoLNSWJ)
* **Generative AI and the Scientific Method: An Impending Collision?** (Stubbs, 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recz9JvaMBFHsODLn/fldfxA7imPa4fayrx/attMDfIHdUAmy7PfA)
* **StudyChat Dataset: Exploring Student Dialogues with ChatGPT** (HuggingFace, 2025) – [PDF](https://huggingface.co/datasets/wmcnicho/StudyChat)
* **Blended RAG: Improving Accuracy with Semantic Search & Hybrid Retrievers** (arXiv:2404.07220, 2024) – [PDF](https://arxiv.org/abs/2404.07220v2)
* **Prompt Chaining vs Stepwise Prompt in Summarization** (arXiv:2406.00507, 2024) – [PDF](https://arxiv.org/abs/2406.00507)
* **QAEA-DR: Unified Text Augmentation for Dense Retrieval** (arXiv:2407.20207, 2025) – [PDF](https://arxiv.org/abs/2407.20207v2)
* **Optimizing RAG: Hyperparameter Impacts** (arXiv:2505.08445, 2025) – [PDF](https://arxiv.org/abs/2505.08445)
* **How much do language models memorize?** (arXiv:2505.24832, 2025) – [PDF](https://arxiv.org/abs/2505.24832)
* **AI Agents vs. Agentic AI: A Conceptual Taxonomy** (arXiv:2505.10468, 2025) – [PDF](https://arxiv.org/abs/2505.10468)
* **Knowledge Compression via Question Generation** (arXiv:2506.13778, 2025) – [PDF](https://arxiv.org/abs/2506.13778)
* **Enhancing Student Focus with Real-Time LLM Compiler Feedback** (2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recUysW21J4KGwp4F/fldfxA7imPa4fayrx/attU6If37lUGcFzio)
* **Towards reliable GenAI-driven scaffolding** (Computers & Education, 2026) – [DOI](https://doi.org/10.1016/j.compedu.2025.105448)
* **“My Boyfriend is AI”: Computational Analysis of AI Companionship** (arXiv:2509.11391, 2025) – [PDF](https://arxiv.org/abs/2509.11391)
* **How People Use ChatGPT** (NBER, 2025) – [PDF](http://www.nber.org/papers/w34255)
* **How OpenAI uses Codex** (Whitepaper, 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recwaJVGq5noMFXS9/fldfxA7imPa4fayrx/att4W3epwPwQyOKaO)
* **Against the Uncritical Adoption of ‘AI’ Technologies in Academia** (Guest et al., 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recF2VFRBDTVpmvyq/fldfxA7imPa4fayrx/attKZ43d68XkSSsTY)
* **What Does ‘Human-Centred AI’ Mean?** (Guest, 2025) – [PDF](https://arxiv.org/abs/2507.19960v2)
* **Delegation to AI Can Increase Dishonest Behaviour** (Nature, 2025) – [DOI](https://doi.org/10.1038/s41586-025-09505-x)
* **Design Principles for GenAI Literacy in Teaching** (Hönigsberg et al., 2025) – [PDF](https://airtable.com/app5WKb7ici3l4ZV2/tblbtIWkj4w8yiIuQ/recuR0vPCJ3MVE6Af/fldfxA7imPa4fayrx/attKEDefgcVu94tBL)
* **The Use of GenAI Tools in Academic Writing: Systematic Review** (Li & Wu, 2025) – [DOI](https://doi.org/10.1007/s43681-025-00827-0)
---
## 📌 Recommendations for Summarization & Printing
For **AI Lab discussion**, I’d recommend selecting papers that:
1. Advance **theoretical framing** of AI (big picture).
2. Provide **practical/technical insights** into RAG, hallucinations, agents.
3. Speak directly to **education and academic practice**.
### High-Priority to Summarize & Print
* **Large AI Models as Cultural & Social Technologies** (Farrell, Science) → Big theoretical reframing.
* **Systematic Review of RAG Systems** → Anchors our ongoing RAG workshop arc.
* **Mitigating Hallucinations via Causal Reasoning** → Directly relevant to reliability.
* **When AIs Judge AIs (Agent-as-a-Judge evaluation)** → Critical for eval pipelines.
* **Experience Embracing GenAI in an Engineering Computations Course** (Barba) → Powerful cautionary teaching case.
* **Trustworthiness of Legal Considerations for LLMs in Education** → Important for Harvard context.
* **Against the Uncritical Adoption of ‘AI’ in Academia** (Guest) → Critical counterpoint.
### Secondary (select for breakout groups or special interest)
* **Drivel-ology** → Fun/rigorous benchmark for deep nonsense interpretation.
* **REFRAG & RAG+** → Cutting-edge RAG efficiency advances.
* **How much do language models memorize?** → Useful for privacy/data governance debates.
* **Delegation to AI Can Increase Dishonest Behaviour** (Nature) → Sparks ethics discussion.
* **Design Principles for GenAI Literacy** → Actionable for curriculum.
* **The Use of GenAI in Academic Writing (Systematic Review)** → Connects with student writing pedagogy.
---
👉 Would you like me to **draft 1–2 page summaries** of the high-priority set (Farrell, RAG review, hallucination/causal reasoning, AIs-as-judges, Barba’s case, legal/education trust, and Guest’s critique)? That way you’d have a compact packet ready for printing.
Here’s a clean bulleted rundown of the chunk you just shared, followed by my recommendations for which pieces are worth deeper summarization and printing for AI Lab discussion:
---
### 📑 Bulleted Summaries
* **[Google’s NEW Agent Money Protocol (A2P) – YouTube, Sam Witteveen](https://www.youtube.com/watch?v=1bIVaODEbTo)**
Video introducing **Agent-to-Payments (A2P)**, Google’s protocol for enabling financial transactions between AI agents.
* Builds on A2A (agent-to-agent) and MCP (model context protocol).
* Use cases: agents buying tickets (e.g. Taylor Swift), shopping & auto-purchasing, recurring purchases, discount-seeking.
* Supports **human-present** (user confirms purchase) and **human-not-present** (agent acts with cryptographic proof of intent) modes.
* Merchant/consumer tensions: loyalty programs vs user autonomy.
* Core principles: openness, privacy by design, defined liability, cryptographic proof of intent.
* Potential precursor to **agent app stores** and microtransactions between agents.
* Google has published docs + GitHub repo for developers.
* **[Locality in Image Diffusion Models Emerges from Data Statistics – arXiv:2509.09672](https://arxiv.org/abs/2509.09672)**
Paper by Artem Lukoianov, Chenyang Yuan, Justin Solomon, Vincent Sitzmann.
* Challenges the view that convolutional inductive bias causes locality in diffusion models.
* Shows instead that **locality emerges from data statistics** (pixel correlations in natural images).
* Provides theoretical + experimental evidence using an optimal linear denoiser.
* Proposes a new analytical denoiser that better matches UNet diffusion model scores.
* 30 pages, 18 figures, 6 tables; strong fit for AI research discussion.
* **[Teen Safety, Freedom, and Privacy – OpenAI (Sam Altman)](https://openai.com/index/teen-safety-freedom-and-privacy/)**
Policy essay balancing three principles:
* **Privacy**: AI conversations should be protected like doctor–patient or lawyer–client confidentiality.
* **Freedom**: Adults should use AI with broad latitude (“treat adults like adults”), within safety bounds.
* **Safety for Teens**: Stronger protections (age-prediction, stricter content limits, parental notification in cases of harm).
* Acknowledges tension between values; advocates transparency in decision-making.
* **[AI Companions Are Taking Over… Let’s Build One – YouTube, Fireship](https://www.youtube.com/watch?v=OfOPrmnHRxw)**
Playful yet serious video about the rise of AI companions and building one.
* Cultural framing: AI chatbots (like xAI’s “Annie”) replacing social/romantic connections.
* Tutorial: builds a **voice-enabled Fireship bot** using Vapi (voice agents), Terso Cloud (database), Astro (frontend), 11 Labs (custom voice).
* Demonstrates end-to-end pipeline: from prompt design to phone-call deployment.
* Witty critique of “terminally online” culture alongside genuine technical demo.
* **[How OpenAI Uses Codex – PDF](https://cdn.openai.com/pdf/6a2631dc-783e-479b-b1a4-af0cfbd38630/how-openai-uses-codex.pdf)**
Internal OpenAI case study on Codex use across engineering teams.
* **Use Cases**: code understanding, refactoring, performance optimization, test coverage, dev velocity, staying in flow, ideation.
* **Anecdotes**: on-call debugging, mass refactors, auto-generated unit tests.
* **Best Practices**: structured prompts, AGENTS.md context files, Best-of-N outputs, Codex task queues.
* Shows Codex already deeply embedded in OpenAI workflows.
* **[Large Language Muddle – n+1 Magazine, Issue 51 (Fall 2025)](https://www.nplusonemag.com/issue-51/the-intellectual-situation/large-language-muddle/)**
Long-form cultural critique of LLMs’ impact on literary and academic life.
* Surveys “AI-and-I” essay genre in the New Yorker, Times, etc.
* Documents harms: homogenization, cognitive debt, declining student writing, AI “slop.”
* Calls for resistance: stigmatization, refusal to publish AI writing, pedagogical redesign, unionization.
* Frames AI writing as “single-use plastic of the mind.”
* Ends with a Luddite call to “smash stereotypes of intellect and vision” imposed by AI.
* **[Effects of Honor Code Reminders on Cheating in Unproctored Exams – ScienceDirect, 2023](https://www.sciencedirect.com/science/article/abs/pii/S0361476X2300067X)**
Double-blind RCT study with Chinese university students.
* Tested **policy reminders**, **exemplar reminders** (real cheating cases), and **consequence reminders** vs no reminder.
* Found all reminder types significantly reduced cheating.
* Suggests even familiar students benefit from pre-exam prompts reinforcing academic integrity.
* Highlights importance of subtle nudges in promoting honesty.
* **[Event Replay: How People Really Use ChatGPT – OpenAI Forum (Chatterji & Deming)](https://forum.openai.com/public/videos/event-replay-how-people-really-use-chatgpt-2025-08-27)**
Discussion of major study on ChatGPT usage.
* **Broad Adoption**: now \~10% of world population uses it weekly.
* **Practical Use**: 80% of interactions fall into “asking, doing, expressing” (esp. guidance, info-seeking, writing).
* **Demographic Gaps**: gender gap and income-country gap closing.
* **Economic Impact**: creates large consumer surplus even if not captured in GDP.
* **Future Work**: macroeconomic effects, education, real-time AI usage tracking.
* **[AI Startups Are Overdone (Finally) – YouTube, Theo / t3.gg](https://www.youtube.com/watch?v=L3vToC1jO64)**
Explains a **“vibe shift”** in Y Combinator and VC investments.
* Critiques “copilot for X” startups built without domain expertise.
* Highlights successful bets on niche, domain-deep companies (e.g. car wash SaaS, pro video editing tools).
* Suggests future AI winners will blend AI competence with real-world expertise.
* Also reflects on YC’s shrinking acceptance rates and investment dynamics.
---
### 📚 Recommendations for Summarization & Printing (AI Lab)
For **AI Lab printing/discussion**, I recommend these as most strategically relevant:
1. **Google A2P protocol** → Key to future agentic economy + agent app stores.
2. **Image diffusion locality paper (arXiv)** → Technical advance; bridges AI theory + practice.
3. **OpenAI: Teen safety, freedom, privacy** → Policy implications; faculty/student concern.
4. **OpenAI internal Codex use** → Case study for embedding AI in workflows; very practical.
5. **Event Replay: How People Use ChatGPT (Deming & Chatterji)** → Large-scale empirical usage data, highly relevant for higher ed.
6. **n+1 “Large Language Muddle”** → Humanistic critique; frames existential risks for academia.
7. **Honor code cheating study** → Directly relevant to teaching/assessment integrity.
The Fireship companion video and Theo’s YC critique are engaging but maybe better for **optional viewing**, not core printouts.
---
Would you like me to prepare **concise 1–2 page printable summaries of each recommended piece** (standalone handouts), or a **compiled multi-article packet** that interweaves them into a single AI Lab briefing?