# RAG vs No-RAG Comparison Report v2 (Q3 Router & Updated Runs)
**Date**: 2026-02-06
**Dataset**: Leishmaniasis Diagnosis (v143) - 55 test queries
**Query Types**: Q1_diagnosis, Q3_image_diagnosis, Q1_Q3_multimodal_diagnosis
**Hardware**: RTX 3090 (local models), RTX 4090 (Gemini runs)
**Features**: Q3 Router enabled, Soft Gate, Type-Aware disabled
---
## Executive Summary
| Model | RAG Accuracy | No-RAG Accuracy | RAG Advantage | Hardware |
|-------|-------------|-----------------|---------------|----------|
| **Gemini 2.5 Pro** | 90.00% | 89.55% | **+0.45%** | RTX 4090 |
| **MedGemma 4B** | 83.18% | 82.27%* | **+0.91%** | RTX 3090 |
| **Gemma3 4B** | 83.18% | 80.00% | **+3.18%** | RTX 3090 |
> [!NOTE]
> **Updated findings with Q3 Router**: RAG now shows positive effect across all models, though gains are marginal for Gemini 2.5 Pro.
>
> *MedGemma No-RAG: 82.27% from previous report (medgemma4b_norag_20260125)
---
## Model Architecture Reference
| Dimension | **Gemini 2.5 Pro** | **MedGemma 4B** | **Gemma 3 4B** |
|-----------|-------------------|-----------------|----------------|
| **Architecture** | Sparse MoE Transformer | Dense Decoder-only | Dense Decoder-only |
| **Total Parameters** | ~150B (est.) | 4B + 400M vision | 4B |
| **Active Parameters** | ~32B (top-2 experts) | 4B | 4B |
| **Context Window** | 1M tokens | 128K tokens | 128K tokens |
| **Vision Encoder** | Native multimodal | MedSigLIP (400M) | SigLIP |
| **Pretraining Focus** | Broad, web-scale | Medical domain | General multimodal |
---
## Detailed Results
### Gemini 2.5 Pro (RTX 4090)
| Metric | RAG | No-RAG | Δ |
|--------|-----|--------|---|
| **Diagnosis Accuracy** | 90.00% | 89.55% | +0.45% |
| **Diagnosis Type Accuracy** | 86.18% | 87.27% | -1.09% |
| Multimodal Faithfulness | 19.05% | N/A | - |
| Multimodal Relevance | 97.62% | N/A | - |
| Context Relevance | 75.60% | N/A | - |
**Run Folders**:
- RAG: [gemini25prorag_soft_rerank_disabled_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25prorag_soft_rerank_disabled_test)
- No-RAG: [gemini25pro_norag_20260201](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25pro_norag_20260201)
**Configuration**: Type-Aware=disabled, Soft rerank, Q3 Router
---
### MedGemma 4B (RTX 3090)
| Metric | RAG | No-RAG | Δ |
|--------|-----|--------|---|
| **Diagnosis Accuracy** | 83.18% | 82.27%* | +0.91% |
| **Diagnosis Type Accuracy** | 77.00% | N/A | - |
| Multimodal Faithfulness | 0.00% | N/A | - |
| Multimodal Relevance | 59.52% | N/A | - |
| Context Relevance | 73.21% | N/A | - |
**Run Folders**:
- RAG: [q3_router_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/q3_router_test)
- No-RAG: [medgemma4b_norag_20260125](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/medgemma4b_norag_20260125)* (from previous report)
**Configuration**: Q3 Router enabled, Soft Gate active
> [!IMPORTANT]
> **MedGemma now shows positive RAG effect (+0.91%)** with Q3 Router.
> This contrasts with previous negative RAG effect (-5.91%) without router.
---
### Gemma3 4B (RTX 3090)
| Metric | RAG | No-RAG | Δ |
|--------|-----|--------|---|
| **Diagnosis Accuracy** | 83.18% | 80.00% | +3.18% |
| **Diagnosis Type Accuracy** | 75.45% | 72.18% | +3.27% |
| Multimodal Faithfulness | 16.67% | 0.00% | - |
| Multimodal Relevance | 69.05% | 0.00% | - |
| Context Relevance | 70.83% | 0.00% | - |
**Run Folders**:
- RAG: [gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemma3_4b_multimodal_fixed_v2)
- No-RAG: [no_rag_gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/no_rag_gemma3_4b_multimodal_fixed_v2)
**Configuration**: FP16, Vision enabled, Soft Gate, Q3 Router
---
## Key Findings
### 1. Q3 Router Eliminates Negative RAG Effect
| Model | Without Router | With Router | Improvement |
|-------|---------------|-------------|-------------|
| **MedGemma 4B** | -5.91% | +0.91% | +6.82% |
| **Gemma3 4B** | N/A | +3.18% | - |
| **Gemini 2.5 Pro** | N/A | +0.45% | - |
The Q3 Router routes pure image queries (Q3_image_diagnosis) to No-RAG mode, preventing retrieval noise for visual-only cases.
### 2. Model Size vs RAG Benefit
| Model | Parameters | Active Params | RAG Benefit |
|-------|-----------|---------------|-------------|
| Gemini 2.5 Pro | ~150B MoE | ~32B | +0.45% |
| Gemma3 4B | 4B | 4B | +3.18% |
| MedGemma 4B | 4B | 4B | +0.91% |
> **Observation**: Smaller models (4B) benefit more from RAG (+0.91% to +3.18%) than large models (+0.45%).
> Gemini 2.5 Pro's strong parametric knowledge (89.55% No-RAG) limits RAG gains.
### 3. Retrieval Quality Metrics
| Run | nDCG@5 | MRR |
|-----|--------|-----|
| Gemma3 4B RAG | 0.2262 | 0.4292 |
| MedGemma 4B RAG | 0.2262 | 0.4292 |
Consistent retrieval quality across local models confirms that performance differences stem from model utilization, not retrieval variance.
---
## Run Catalog
| Run ID | Model | Type | Date | Hardware | Notes |
|--------|-------|------|------|----------|-------|
| [gemini25prorag_soft_rerank_disabled_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25prorag_soft_rerank_disabled_test) | Gemini 2.5 Pro | RAG | 2026-02-01 | RTX 4090 | Type-Aware disabled, Soft rerank |
| [gemini25pro_norag_20260201](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25pro_norag_20260201) | Gemini 2.5 Pro | No-RAG | 2026-02-01 | RTX 4090 | - |
| [q3_router_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/q3_router_test) | MedGemma 4B | RAG | 2026-02-06 | RTX 3090 | Q3 Router |
| [medgemma4b_norag_20260125](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/medgemma4b_norag_20260125) | MedGemma 4B | No-RAG | 2026-01-25 | RTX 3090 | From v1 report |
| [gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemma3_4b_multimodal_fixed_v2) | Gemma3 4B | RAG | 2026-02-06 | RTX 3090 | Fixed multimodal |
| [no_rag_gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/no_rag_gemma3_4b_multimodal_fixed_v2) | Gemma3 4B | No-RAG | 2026-02-06 | RTX 3090 | - |
---
## Configuration Details
### RAG Pipeline Settings
- **Retrieval**: Hybrid (BM25 + Dense)
- **Top-K**: 5
- **Rerank**: Soft rerank (not hard rerank)
- **Q3 Router**: Enabled (routes Q3 → No-RAG)
- **Type-Aware**: Disabled (for Gemini runs)
- **Gate Threshold**: 0.0250
### Model Loading
- **Gemma3 4B**: FP16, Vision=True, 23.7GB VRAM
- **MedGemma 4B**: FP16, Vision=False, google/medgemma-4b-it
- **Gemini 2.5 Pro**: Cloud API, google.genai SDK
---
## Appendix: Comparison with Report v1
| Model | v1 RAG Effect | v2 RAG Effect | Change |
|-------|--------------|---------------|--------|
| Gemini 2.5 Pro | +30.45%* | +0.45% | Different baseline |
| Gemma3 12B | +3.18% | N/A | - |
| MedGemma 4B | -5.91% | +0.91% | **+6.82%** |
| Gemma3 4B | N/A | +3.18% | New model |
> [!WARNING]
> **v1 vs v2 differences**:
> - v1 used Gemma3 12B, v2 uses Gemma3 4B
> - v1 Gemini comparison had different baseline (56.82% No-RAG vs 89.55%)
> - v2 includes Q3 Router which routes image-only queries to No-RAG
---
## Source Files
- Gemma3 4B RAG: `rtx3090/gemma3/gemma3_4B_multimodal_performance.txt`
- Gemma3 4B No-RAG: `rtx3090/gemma3/gemma3_4B_no-rag_multimodal.txt`
- MedGemma 4B RAG: `rtx3090/medgemma4b_q3_routers.txt`
- Gemini RAG: `rtx4090/rag_disable_type_aware_soft_rerank.txt`
- Gemini No-RAG: `rtx4090/no-rag_gemini25pro_q3_routers.txt`
- Model Details: `model_details/perplexitypro.md`