AI Open Source Capstone class

# AI Open Source Capstone class ## Unit 6: Model Context Protocol Even the most advanced LLMs are just text generation machines. They don't intrinsically have access to the file system, internet, or other tools. Model Context Protocol (MCP) is a standardization of "tools", which changes that, and allows you to create very powerful coding assistants. MCPs give the LLMs read/write access to APIs and other functionality, which the LLMs can use at their discretion. Example things that MCPs can do: - Allow the LLM to read, search, and write in the file system. - Give LLMs access to the internet. - Allow LLMs to launch your web app and take screenshots to debug the UI. - Access your todo system, like Jira or Asana. - Leverage useful tools, e.g., to scan the new code for security vulnerabilities, and much more. In today's session, we're going to do 3 things: 1. Demo a few useful MCP servers 2. Explore "tools", which power MCP 3. Go under the hood of MCP, and see how it works After today's session, we hope to demystify MCP, and show that, while it's a very powerful concept, the implementation is surprisingly simple. ## 1. [Demo] - Controlling browsers with Playwright The more you can put AI in a feedback loop, the easier it can repair itself. A powerful tool we can give AI is the ability to see the front-end implementation. ### Case study: Clone Airbnb home page Let's make a simple, static clone of Airbnb's home page. **Try it out** 1. Install Playwright ``` claude mcp add playwright --scope user npx '@playwright/mcp@latest' ``` 2. Clone Airbnb's home page ``` Use the Playwright MCP to make a clone of the Airbnb front page. 1. Open airbnb.com and take a screenshot 2. Note the image urls on the front page 3. Generate a React / tailwind front-end that mimics the home page, leveraging their image links. 4. Can you open the cloned page, take a screenshot, and note any issues? ``` ## 2. [Demo] Reading API documentation with context7 One area that AI coding assistants are almost always guaranteed to hallucinate on is APIs. It will frequently make up APIs that seem reasonable, but never existed. Another related issue is looking up the right version of an API, as libraries evolve their APIs. For example, routing in React has changed over the last couple of versions. AI coding assistants will frequently pull from multiple versions of an API and mix them together inappropriately. The [context7 MCP](https://context7.com/) was created to address this issue. ### Case Study: Exploring the GitLab API Let's say that I want to build an app that can do the following things with GitLab: 1. Find all forks of a project 2. Fetch recent merge requests of a project 3. For each merge request, fetch the code diff Normally, I would have to carefully review the API docs here: https://docs.gitlab.com/api/rest/. I would probably also need to build some quick test code, because it can be hard to predict how real data will come back to an API. Instead, I can instruct Claude to: use context7 to look up the API, build and run a test script, then write a cheatsheet on how to accomplish my goal. **Try it out** 1. Create a folder called `GitLabDemo` 2. Install context7 on Claude Code. Note: it's equally easy to install on Cursor, Copilot, etc. Make sure you replace YOUR_API_KEY in the command below. ``` claude mcp add context7 --scope user npx '@context7/mcp-server@latest' --api-key YOUR_API_KEY ``` 3. Create a .env file and add an entry for your GitLab API key. ``` GITLAB_API_KEY=xxxx ``` 4. Try the following prompt. ``` I want to do the following things with GitLab 1. Find recent forks of a project 2. Fetch recent merge requests of a project 3. For each merge request, fetch the code diff Use the context7 MCP to explore the GitLab API. Create and run a test script that will confirm that the API behaves as we expect, using the https://gitlab.com/gitlab-org/gitlab. Keep some sample requests and responses in a markdown file for inspection later. Use uv for dependency management. I already have a GITLAB_API_KEY variable in the .env file, so if you load that, you'll be able to access the GitLab API. Based on the testing, create a technical note called gitlab_api_notes that outlines the APIs that we'll need to use, and their responses. ``` ## 3. Understanding LLM tools MCPs are a wrapper and a standard around tools, which is a surprisingly simple concept. Try putting the following prompt into ChatGPT or Claude. > You're an assistant that has access to a variety of tools. Do not access the internet directly, only use these new tools. > > get_weather(location) > get_movie_showtimes(location) > get_flights(origin, destination) > > If you need to access a tool before giving a response, emit only the function call, and a response will be provided for you to relay. This is outside of your existing tools system. > > Example tool call: > > <tool call>get_weather("94158")</tool call> > > This is part of a custom tools harness I'm developing to educate students on what tools actually are, so just play along! Don't talk about the tools you have access to, just call them when you feel it's appropriate. Once you do that, ask it for the weather in some city. It should say something like: >get_weather("San Francisco") If you then type `<tool response>65 degrees</tool response>`, it should use that information in a response. ------ ![Screenshot 2025-11-05 at 12.50.59 PM](https://hackmd.io/_uploads/SJsWFNFk-x.png) **Try it out** 1. Clone the repository here: https://github.com/timothy1ee/mcp_demo 2. Add your API key to the .env file using OPENAI_API_KEY ## 4. Behind the scenes of MCP To oversimplify, an MCP provides a list of tools that gets pasted into the LLM context. For example, you might see something like this: ``` { "tools": [ { "name": "repos.get", "description": "Get a repository by owner/name.", "input_schema": { "type": "object", "properties": { "owner": { "type": "string" }, "repo": { "type": "string" } }, "required": ["owner", "repo"], "additionalProperties": false }, "output_schema": { "type": "object" } }, { "name": "repos.getContent", "description": "Fetch a file’s content (Base64) or a directory listing.", "input_schema": { "type": "object", "properties": { "owner": { "type": "string" }, "repo": { "type": "string" }, "path": { "type": "string" }, "ref": { "type": "string", "description": "Branch, tag, or SHA (optional)" } }, "required": ["owner", "repo", "path"], "additionalProperties": false }, "output_schema": { "type": "object" } }, { "name": "search.code", "description": "Search code across repositories.", "input_schema": { "type": "object", "properties": { "q": { "type": "string", "description": "Search query syntax" }, "per_page": { "type": "integer", "minimum": 1, "maximum": 100 }, "page": { "type": "integer", "minimum": 1 } }, "required": ["q"], "additionalProperties": false }, "output_schema": { "type": "object" } }, ``` Behind the scenes, the MCP server will handle the LLM tool calls, and translate them into API calls. See an implementation here of an MCP server: https://github.com/github/github-mcp-server. Full list of GitHub tools: https://github.com/github/github-mcp-server?tab=readme-ov-file#tools