# Module 2 (MCP & RAG): Basic AI Implementation with Ollama (1)

Author: **Ryan Safa Tjendana**
## What is a Large Language Model (LLM)
### 1. Introduction
A **Large Language Model (LLM)** is an artificial intelligence system trained on massive text datasets to understand and generate human-like language.
LLMs power modern chatbots, assistants, and reasoning systems like **ChatGPT**, **Claude**, or **Llama**.
They can:
- Generate coherent text
- Answer questions
- Translate languages
- Summarize documents
- Perform reasoning tasks
---
### 2. How LLMs Work
#### Step-by-Step Concept
1. **Tokenization**
- Text is broken into small pieces called *tokens* (words, subwords, or characters).
- Example: `"Who Are You?"` → `["Who", "A", "re", "You", "?"]`.
2. **Embedding**
- Each token is converted into a numerical vector that captures its meaning and context.
3. **Transformer Architecture**
- The **Transformer** is the core architecture behind LLMs.
- It uses **self-attention** to understand relationships between all words in a sentence simultaneously.
4. **Training Process**
- The model predicts the next token given the previous ones.
- Example: “I love” → model learns that “you” or “chocolate” could be likely next tokens.
- This is repeated **billions of times** across large datasets.
5. **Inference (Generation)**
- When we give it a prompt, it predicts one token at a time, building sentences word by word.
---
### 3. Why "Large"?
The “Large” in LLM refers to:
- **Parameters:** the internal weights (often billions or trillions)
- **Data:** trained on terabytes of text
- **Computation:** requires high-end GPUs and clusters
Larger models tend to:
- Understand context better
- Produce more coherent and factual outputs
- Generalize across diverse tasks without task-specific training
---
### 4. Types of LLMs
| Type | Example Models | Description |
|------|----------------|--------------|
| **Decoder-only** | GPT, LLaMA, Mistral | Best for text generation & chat |
| **Encoder-only** | BERT, RoBERTa | Best for understanding & classification |
| **Encoder-Decoder** | T5, FLAN-T5 | Best for translation & summarization |
---
### 5. Key Concepts
#### Attention Mechanism
- Allows the model to “focus” on relevant words while reading text.
- Formulaically, it learns *how much each token matters* relative to others.
#### Context Window
- LLMs read only a limited number of tokens at once (e.g., 8K, 32K, or 1M tokens).
- Beyond this, they “forget” — leading to the need for **RAG** or **MCP** (covered later).
#### Prompting
- The way we communicate with an LLM.
- Example:
"System: You are a helpful assistant."
"User: Explain how transformers work in simple terms."
### Hallucination
#### What is Hallucination?
**Hallucination** happens when a Large Language Model produces information that sounds correct — but is **factually wrong, fabricated, or unsupported by its data**.
> Simply put: an LLM "makes things up" confidently.
These errors aren’t random bugs — they emerge from how LLMs are trained:
- LLMs **predict the next token** based on probability, *not truth*.
- They **don’t have direct access** to real-time facts or databases.
- If uncertain, they still try to complete the sentence **plausibly**.
---
#### Example
#### Prompt:
> "Who discovered the COVID-19 virus in 2017?"
#### Model Response:
> "COVID-19 was discovered by Dr. James Renfield in 2017 based on "Renfield on Covid-19" book."
It sounds confident and convincing,
But it is Entirely false (COVID-19 appeared in 2019)
The model **hallucinates** because:
1. The question is based on a false premise.
2. The model is *trained to answer*, not to verify.
3. The model is trying to answer but have no context provided.
---
### Why Hallucination Happens
| Cause | Explanation |
|--------|-------------|
| **Next-token prediction bias** | The model is trained to sound coherent, not correct. |
| **Training data noise** | Incomplete or inconsistent data can lead to false associations. |
| **Overconfidence** | Models often assign high probabilities to plausible but false completions. |
| **Lack of grounding** | The model has no real-time link to factual databases. |
| **Prompt ambiguity** | Vague or misleading prompts push the model to “fill in” missing info. |
| **Lack of Context** | Lack of context can also make model to hallucinate and try to answer with its own answer |
| **Out of Context Window** | Model can answer correctly based on the context given, but there isn't any context window available so the model will discard the context and answer it with hallucination |
---
### Types of Hallucinations
| Type | Description | Example |
|------|--------------|----------|
| **Factual** | Incorrect or fabricated facts | “Einstein was born in 1955.” |
| **Logical** | Inconsistent reasoning | “All birds can swim; therefore, penguins can fly.” |
| **Contextual** | Ignoring earlier parts of the prompt | Forgetting user constraints or context. |
| **Grounding** | Making up citations or references | “According to Nature (2020)...” (nonexistent paper) |
---
### How to Detect Hallucinations
1. **Cross-Verification:** Compare LLM output with trusted sources.
2. **Consistency Check:** Ask the same question in multiple ways — if answers change, it’s unstable.
3. **Ask for Evidence:** Prompt the model with “cite your source” or “show reasoning.”
4. **Automated Fact-Checking Tools:** Use retrieval-based validation (RAG).
---
### How to Reduce Hallucinations
| Approach | Description |
|-----------|--------------|
| **RAG (Retrieval-Augmented Generation)** | Connects LLMs to verified external data — grounding answers. |
| **MCP (Model Context Protocol)** | Lets LLMs use tools (search, APIs, databases) to verify before answering. |
| **Prompt Design** | Encourage cautious tone: “If unsure, say you don’t know.” |
| **Fine-tuning** | Train models on domain-specific and verified data. |
| **Post-processing filters** | Apply fact-checking or confidence scoring after generation. |
---
### Key Takeaway
> LLMs are **masters of language**, not **guardians of truth**.
Hallucination doesn’t mean the model is broken — it means it’s doing exactly what it was trained to do: generate likely text.
Our job as developers and researchers is to **ground it**, **verify it**, and **guide it** toward accurate reasoning — through methods like **RAG** and **MCP**.
---
**This connects directly to Week 2 and 3:**
- **Week 2 (RAG):** Grounding LLM outputs in real documents.
- **Week 3 (MCP):** Allowing models to check facts or take actions via tools.
---
### 6. Example Flow

Input Prompt → Tokenization → Embedding → Transformer Layers (Attention + Feedforward) → Next Token Prediction → Text Generation
---
### 7. Summary
| Concept | Key Idea |
|----------|-----------|
| **LLM** | A model that understands and generates text |
| **Core Architecture** | Transformer (Self-Attention + Feedforward layers) |
| **Training Objective** | Predict the next token |
| **Applications** | Chatbots, summarization, translation, code generation |
---
### 8. Next Step
In the next section, you’ll learn how to **run your own LLM locally using Ollama**, explore model management, and try simple **prompt engineering**.
---