# Module 2 (MCP & RAG): Basic AI Implementation with Ollama (1) ![Group 1171275599](https://hackmd.io/_uploads/Sk1R5FG6gx.png) Author: **Ryan Safa Tjendana** ## What is a Large Language Model (LLM) ### 1. Introduction A **Large Language Model (LLM)** is an artificial intelligence system trained on massive text datasets to understand and generate human-like language. LLMs power modern chatbots, assistants, and reasoning systems like **ChatGPT**, **Claude**, or **Llama**. They can: - Generate coherent text - Answer questions - Translate languages - Summarize documents - Perform reasoning tasks --- ### 2. How LLMs Work #### Step-by-Step Concept 1. **Tokenization** - Text is broken into small pieces called *tokens* (words, subwords, or characters). - Example: `"Who Are You?"` → `["Who", "A", "re", "You", "?"]`. 2. **Embedding** - Each token is converted into a numerical vector that captures its meaning and context. 3. **Transformer Architecture** - The **Transformer** is the core architecture behind LLMs. - It uses **self-attention** to understand relationships between all words in a sentence simultaneously. 4. **Training Process** - The model predicts the next token given the previous ones. - Example: “I love” → model learns that “you” or “chocolate” could be likely next tokens. - This is repeated **billions of times** across large datasets. 5. **Inference (Generation)** - When we give it a prompt, it predicts one token at a time, building sentences word by word. --- ### 3. Why "Large"? The “Large” in LLM refers to: - **Parameters:** the internal weights (often billions or trillions) - **Data:** trained on terabytes of text - **Computation:** requires high-end GPUs and clusters Larger models tend to: - Understand context better - Produce more coherent and factual outputs - Generalize across diverse tasks without task-specific training --- ### 4. Types of LLMs | Type | Example Models | Description | |------|----------------|--------------| | **Decoder-only** | GPT, LLaMA, Mistral | Best for text generation & chat | | **Encoder-only** | BERT, RoBERTa | Best for understanding & classification | | **Encoder-Decoder** | T5, FLAN-T5 | Best for translation & summarization | --- ### 5. Key Concepts #### Attention Mechanism - Allows the model to “focus” on relevant words while reading text. - Formulaically, it learns *how much each token matters* relative to others. #### Context Window - LLMs read only a limited number of tokens at once (e.g., 8K, 32K, or 1M tokens). - Beyond this, they “forget” — leading to the need for **RAG** or **MCP** (covered later). #### Prompting - The way we communicate with an LLM. - Example: "System: You are a helpful assistant." "User: Explain how transformers work in simple terms." ### Hallucination #### What is Hallucination? **Hallucination** happens when a Large Language Model produces information that sounds correct — but is **factually wrong, fabricated, or unsupported by its data**. > Simply put: an LLM "makes things up" confidently. These errors aren’t random bugs — they emerge from how LLMs are trained: - LLMs **predict the next token** based on probability, *not truth*. - They **don’t have direct access** to real-time facts or databases. - If uncertain, they still try to complete the sentence **plausibly**. --- #### Example #### Prompt: > "Who discovered the COVID-19 virus in 2017?" #### Model Response: > "COVID-19 was discovered by Dr. James Renfield in 2017 based on "Renfield on Covid-19" book." It sounds confident and convincing, But it is Entirely false (COVID-19 appeared in 2019) The model **hallucinates** because: 1. The question is based on a false premise. 2. The model is *trained to answer*, not to verify. 3. The model is trying to answer but have no context provided. --- ### Why Hallucination Happens | Cause | Explanation | |--------|-------------| | **Next-token prediction bias** | The model is trained to sound coherent, not correct. | | **Training data noise** | Incomplete or inconsistent data can lead to false associations. | | **Overconfidence** | Models often assign high probabilities to plausible but false completions. | | **Lack of grounding** | The model has no real-time link to factual databases. | | **Prompt ambiguity** | Vague or misleading prompts push the model to “fill in” missing info. | | **Lack of Context** | Lack of context can also make model to hallucinate and try to answer with its own answer | | **Out of Context Window** | Model can answer correctly based on the context given, but there isn't any context window available so the model will discard the context and answer it with hallucination | --- ### Types of Hallucinations | Type | Description | Example | |------|--------------|----------| | **Factual** | Incorrect or fabricated facts | “Einstein was born in 1955.” | | **Logical** | Inconsistent reasoning | “All birds can swim; therefore, penguins can fly.” | | **Contextual** | Ignoring earlier parts of the prompt | Forgetting user constraints or context. | | **Grounding** | Making up citations or references | “According to Nature (2020)...” (nonexistent paper) | --- ### How to Detect Hallucinations 1. **Cross-Verification:** Compare LLM output with trusted sources. 2. **Consistency Check:** Ask the same question in multiple ways — if answers change, it’s unstable. 3. **Ask for Evidence:** Prompt the model with “cite your source” or “show reasoning.” 4. **Automated Fact-Checking Tools:** Use retrieval-based validation (RAG). --- ### How to Reduce Hallucinations | Approach | Description | |-----------|--------------| | **RAG (Retrieval-Augmented Generation)** | Connects LLMs to verified external data — grounding answers. | | **MCP (Model Context Protocol)** | Lets LLMs use tools (search, APIs, databases) to verify before answering. | | **Prompt Design** | Encourage cautious tone: “If unsure, say you don’t know.” | | **Fine-tuning** | Train models on domain-specific and verified data. | | **Post-processing filters** | Apply fact-checking or confidence scoring after generation. | --- ### Key Takeaway > LLMs are **masters of language**, not **guardians of truth**. Hallucination doesn’t mean the model is broken — it means it’s doing exactly what it was trained to do: generate likely text. Our job as developers and researchers is to **ground it**, **verify it**, and **guide it** toward accurate reasoning — through methods like **RAG** and **MCP**. --- **This connects directly to Week 2 and 3:** - **Week 2 (RAG):** Grounding LLM outputs in real documents. - **Week 3 (MCP):** Allowing models to check facts or take actions via tools. --- ### 6. Example Flow ![image](https://hackmd.io/_uploads/HJ6cIFMTxl.png) Input Prompt → Tokenization → Embedding → Transformer Layers (Attention + Feedforward) → Next Token Prediction → Text Generation --- ### 7. Summary | Concept | Key Idea | |----------|-----------| | **LLM** | A model that understands and generates text | | **Core Architecture** | Transformer (Self-Attention + Feedforward layers) | | **Training Objective** | Predict the next token | | **Applications** | Chatbots, summarization, translation, code generation | --- ### 8. Next Step In the next section, you’ll learn how to **run your own LLM locally using Ollama**, explore model management, and try simple **prompt engineering**. ---