Module 4 (MCP & RAG): Model Context Protocol (MCP)

# Module 4 (MCP & RAG): Model Context Protocol (MCP) ![Group 1171275604](https://hackmd.io/_uploads/SkZg8O2Cgx.png) Author: **Ryan Safa Tjendana** [toc] ## What is MCP? The Model Context Protocol (MCP) is an open standard that defines how LLM clients (like chat interfaces or AI assistants) can communicate with external tools, data sources, or services in a structured, consistent, and secure way. ### MCP flow analogy: ```txt= LLM ↔ MCP Client ↔ MCP Server ↔ Tools/Data Sources ``` ### MCP - LLM Analogy Imagine an LLM like a smart brain that doesn’t know how to “use” tools directly. The MCP acts like a remote control interface, it let the brain call specific tools, get results, and reason about them, all in a predictable, structured format. ### MCP Flow: ![image](https://hackmd.io/_uploads/S1SG9Yl1bl.png) ## Why do We/LLM need MCP? ### 1. LLMs Don’t Have Built-in Access to the Real World LLMs are trained on massive text data, but once training is done, they’re static: - They don’t have live access to your database, APIs, or filesystem. - They can’t fetch real-time data (like weather, stock prices, or files on your machine). - Their “knowledge” is frozen at training time. So when we ask to the LLM: **“What’s the latest sales number from our database?”** The model can’t really know, unless it’s given a way to call an external tool or access contextual data dynamically. **MCP** provides a standard interface where the LLM can request a tool (like get_sales_data) through an MCP client, which forwards it to a MCP server that actually knows how to do that. **Result:** The model stays focused on reasoning, while MCP handles action and retrieval. ### 2. Tool Integrations Are Currently Fragmented Before MCP, every company or platform built its own custom plugin or API format: - OpenAI had function calling. - Anthropic used tools. - LangChain, LlamaIndex, etc. all had their own “agent-tool” schemas. Each format had: - Different JSON structures - Different ways to handle arguments and results - Different authentication and permission models That means: - Developers must rewrite the same tool multiple times. - Tools can’t easily be reused across ecosystems. - Models can’t share a unified understanding of tool behavior. It all conclude that before MCP, connecting LLMs to tools or data was ad hoc and fragmented: - Each AI app had to invent its own plugin or API format. - Models couldn’t easily reuse the same tools across systems. - Tool providers needed to write multiple integrations for each AI platform. MCP solves this fragmentation by defining a universal protocol (JSON-RPC) that every client and server can implement once, and it will “just work” everywhere. ### 3. LLMs Need Structured, Safe, and Observable Tool Use #### Problems When an LLM calls a tool, several risks and limitations arise: - It may pass incorrect or unsafe parameters. - It may receive unstructured or inconsistent responses. - It may need human-in-the-loop validation or logging. Without a standard, each implementation must reinvent: - Schema validation - Logging - Error handling - Access control #### MCP Solution MCP provides well-defined message types and error codes, so: - Every tool call and response follows a predictable JSON-RPC schema. - Clients can log and monitor exactly what the model is doing. - Servers can safely limit what tools and contexts are exposed. This creates a transparent, auditable interface between model and environment. ### 4. LLMs Need Access to Context Beyond Their Prompt Even when not “calling a tool,” LLMs often need to reference or reason about external information: - Local project files - Database schema - Recent chat history - Knowledge base documents Previously, developers had to hack together retrieval pipelines (e.g., RAG, vector search, file upload APIs). MCP defines Context Providers — a standardized way for servers to expose structured context to the client. For example: ```json= { "method": "context/list", "result": [ { "uri": "file://docs/design.md", "label": "Design Document" } ] } ``` The LLM can then “see” what files or data are available, and selectively load them. ### 5. LLMs Need Security and Separation of Concerns Directly giving a model access to your system (filesystem, APIs, databases) is dangerous: - What if it deletes files or leaks secrets? - What if a malicious prompt tricks it into exfiltrating data? Without a clear boundary, it’s hard to trust an AI assistant. MCP enforces separation of responsibilities: - The model doesn’t execute code. - The client decides which servers/tools to expose. - The server defines what’s actually allowed. This makes AI integrations safe by design, not just by convention ## MCP Core Components ![image](https://hackmd.io/_uploads/Hy006Fl1Zl.png) ### Overview The Model Context Protocol (MCP) is designed to create a standardized, reliable interface between large language models (LLMs) and external tools or data sources. It is built around three main components: - MCP Server – provides tools, data, or context - MCP Client – manages communication between the model and servers - MCP Protocol – defines the message format and communication rules Together, these components form a complete system that allows an LLM to interact safely and consistently with the outside world. ### MCP Server The MCP Server is the component that actually does things. It is responsible for implementing tools, exposing data sources, and handling execution when the client requests an operation. Responsibilities: - Defines and exposes tools (functions the model can call) - Provides structured access to data or context - Executes tool logic and returns structured results - Follows the MCP protocol (JSON-RPC format) ### MCP Client The MCP Client acts as an intermediary between the language model and one or more MCP servers. It manages connections, forwards tool calls, and ensures communication follows the MCP protocol. Responsibilities - Connects to one or more MCP servers - Lists available tools and context sources - Forwards requests and responses between model and server - Handles routing, logging, and capability management - Enforces access permissions and sandboxing Typical Client Workflow 1. Connect to one or more MCP servers. 2. Request the list of available tools (tools/list). 3. Receive a request from the model to call a tool. 4. Forward the call to the correct MCP server. 5. Return the structured result to the model. Example Sequence 1. Model requests: “What time is it?” 2. Client calls the server’s get_time tool. 3. Server executes and returns { "time": "2025-10-27T12:00:00Z" }. 4. Client delivers result to model. 5. Model replies with: “The current time is 12:00 UTC.” The client ensures that the model never interacts directly with system resources; it only sees structured tool definitions and results. ### MCP - Protocol Standard The protocol itself defines how clients and servers communicate. MCP is based on JSON-RPC 2.0, a lightweight, language-agnostic messaging format. Responsibilities: - Defines message structures for requests, responses, and errors - Specifies standard method names (tools/list, tools/call, context/list, context/read) - Manages versioning and capability negotiation - Ensures interoperability across all MCP implementations #### Example Message Flow **Client → Server (request available tools):** ```json= { "jsonrpc": "2.0", "method": "tools/list", "id": 1 } ``` **Server → Client (response):** ```json= { "jsonrpc": "2.0", "result": { "tools": [ { "name": "get_time", "description": "Returns the current time" }, { "name": "get_weather", "description": "Gets weather by city" } ] }, "id": 1 } ``` **Client → Server (invoke a tool):** ```json= { "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "get_time", "arguments": {} }, "id": 2 } ``` **Server → Client (result):** ```json= { "jsonrpc": "2.0", "result": { "time": "2025-10-27T12:00:00Z" }, "id": 2 } ``` ## MCP Implementation Installation & Testing In this section, we will try to build a functional MCP application using FastMCP with Ollama as the LLM runtime. Please refer to this FastMCP documentation to learn more about FastMCP [Here](https://gofastmcp.com/getting-started/welcome). ### 1. Get started Before we get into the FastMCP testing, lets make sure that we have all the environment ready for MCP Testing. Not All Model of LLM can run MCP, only model that can do tool calling that can use MCP. In Ollama you can easily spot if the model can run MCP or not by looking at its tag, like the example below ![image](https://hackmd.io/_uploads/SJIj---kWg.png) Below the descriptions of the model, you can see there are tags, and those tags determine the features that the LLM have. One of the tags says "tools" that is the feature needed to run MCP on the LLM. **Feel free to choose the model** that you want to use in this test, but we recommend to use **Qwen2.5 or Qwen3** model. After pulling your desired model, let's go to next step. ### 2. FastMCP with Ollama For the FastMCP with Ollama test, we will use [this](https://github.com/8LWXpg/ollama-mcp-client) repository as our reference. Please read it carefully and **try to understand the code**! Pull the code into your Linux machine that runs the Ollama using this command: ```bash= git pull https://github.com/8LWXpg/ollama-mcp-client cd ollama-mcp-client ``` We are strongly advise you to use Visual Studio Code as the text editor and install Python Extension. If you have Vscode installed you can try to execute this command inside the directory: ```bash= code . ``` it will open the directory in the Vscode, or if the command doesnt work, you should install the WSL extension in the Vscode if you use WSL and manually choose the directory. After pulling the repository, execute these command to build a virtual environment for this testing, so the environment needed will not affect the Operating System. Execute these commands inside the repository directory: ```bash= uv venv uv pip install -e . ``` After it is successfully installed let's get to the MCP Part. Navigate to the /clients/ollama_client,py and open it on text editor. On line 135 you should see something like this ```python= async def process_message(self, message: str, model: str = "qwen2.5:14b") -> AsyncIterator[ChatResponse]: ``` You should **change the default model string** to the model that you are using. Navigate to the /examples/ollama_example.py and open it on text editor. On line 15 you can see something like this ```python= async with await OllamaMCPClient.create(config, "192.168.0.33:11434") as client: ``` You should change the ip to localhost / loopback ip if you run the ollama on the same machine, change the line into this line: ```python= async with await OllamaMCPClient.create(config, "127.0.0.1:11434") as client: ``` After changing the line the MCP code should work and we can run it, but take note that you should try to understand the code before running it to avoid confusion afterwards. To run the code execute this command below ```bash= uv run examples/ollama_example.py examples/server.json ``` After you execute the command you should see something like this ![image](https://hackmd.io/_uploads/B13otZWJZl.png) You can try to test by giving it prompt to call a tool from MCP Server, example ***PROMPT = "Get a random number"*** See if the model will get the random number from a function not from itself. If the MCP works it will look like this if you prompt with the above prompt: ![image](https://hackmd.io/_uploads/SkWHc-Z1We.png) There will be a Debug section that prints what tool the model called to answer the user prompt. If the output is same as above, then congrats you have successfully installed the simple FastMCP Application. Let's head to the next section for a more advanced implementation of MCP. ### 3. MCP Server Tools adding Before we get into this section we **strongly advise to understand** the ollama_client.py code and also the ollama_example.py code, **both code are very fundamental** because it is the MCP Client code which is one of MCP Core Components, but we will not focus on it here. To add a tool in this code, you can simply navigate to server/server.py code, to add the tool you can add a similar function as others that is already defined in it. Also dont forget to put the documentation on the function as well because it is very important for the LLM to decide which tool that should be used based on the documentation. For example, lets make a simple new tool that the model can't answer it before we give it the context. Look at the code below ```python= @mcp.tool() async def get_my_name() -> str: """Gets the name of the user Returns: str: name of the user """ return "Ryan" ``` You can freely change the return value. After you add it on the file, try to restart the execution, and prompt it like this "What is my name?" If the tool above works it should look like this: ![image](https://hackmd.io/_uploads/Sks7TWWy-l.png) ![image](https://hackmd.io/_uploads/Sy7V6Zb1Zg.png) ![image](https://hackmd.io/_uploads/HynV6ZbkWg.png) ### 4. MCP Server adding If the application needs more than just a tool, it needs a new MCP Server then we can simply add the server to the list. First we need to create the server file, to add it we can try to make a new file **server/secondServer.py:** ```python= from mcp.server.fastmcp import FastMCP mcp = FastMCP("test") @mcp.tool() async def my_favorite_color() -> str: """Gets the favorite color of the user Returns: str: favorite color of the user """ return "Blue" if __name__ == "__main__": mcp.run(transport="stdio") ``` This server will only have one tool to call the user favorite color. After adding the file, now navigate to /example/server.json . On line 7 you should see that is the directory of the used server before, now we should define our server like that. It should look like this now ```json= { "stdio": { "test": { "command": "uv", "args": [ "run", "server/server.py" ] }, "second-server": { "command": "uv", "args": [ "run", "server/secondServer.py" ] }, "sqlite-test": { "command": "uvx", "args": [ "mcp-server-sqlite", "--db-path", "test.sqlite" ] } } } ``` Now lets execute the MCP again now the tools discovery should have something like this ![image](https://hackmd.io/_uploads/Bko8kzZ1-l.png) As you can see at the right of the picture there is a 'second-server/my_favorite_color' and that indicates that our MCP client recognize the new server and ready to call the tool provided by the server. Lets test it by prompting it with "What is my favorite color?" If everything configured correctly it should look like this: ![image](https://hackmd.io/_uploads/HJUJxzbJbx.png) ![image](https://hackmd.io/_uploads/HylgxG-1Zl.png) ![image](https://hackmd.io/_uploads/HkKxlfZJWg.png) And that is the end of this FastMCP with Ollama testing, keep testing and play with the MCP to improve your knowledge in this field and we **strongly advise you to really understand** the code above because everything in it is very fundamental for the future problems.