# RAG vs No-RAG Comparison Report v2 (Q3 Router & Updated Runs) **Date**: 2026-02-06 **Dataset**: Leishmaniasis Diagnosis (v143) - 55 test queries **Query Types**: Q1_diagnosis, Q3_image_diagnosis, Q1_Q3_multimodal_diagnosis **Hardware**: RTX 3090 (local models), RTX 4090 (Gemini runs) **Features**: Q3 Router enabled, Soft Gate, Type-Aware disabled --- ## Executive Summary | Model | RAG Accuracy | No-RAG Accuracy | RAG Advantage | Hardware | |-------|-------------|-----------------|---------------|----------| | **Gemini 2.5 Pro** | 90.00% | 89.55% | **+0.45%** | RTX 4090 | | **MedGemma 4B** | 83.18% | 82.27%* | **+0.91%** | RTX 3090 | | **Gemma3 4B** | 83.18% | 80.00% | **+3.18%** | RTX 3090 | > [!NOTE] > **Updated findings with Q3 Router**: RAG now shows positive effect across all models, though gains are marginal for Gemini 2.5 Pro. > > *MedGemma No-RAG: 82.27% from previous report (medgemma4b_norag_20260125) --- ## Model Architecture Reference | Dimension | **Gemini 2.5 Pro** | **MedGemma 4B** | **Gemma 3 4B** | |-----------|-------------------|-----------------|----------------| | **Architecture** | Sparse MoE Transformer | Dense Decoder-only | Dense Decoder-only | | **Total Parameters** | ~150B (est.) | 4B + 400M vision | 4B | | **Active Parameters** | ~32B (top-2 experts) | 4B | 4B | | **Context Window** | 1M tokens | 128K tokens | 128K tokens | | **Vision Encoder** | Native multimodal | MedSigLIP (400M) | SigLIP | | **Pretraining Focus** | Broad, web-scale | Medical domain | General multimodal | --- ## Detailed Results ### Gemini 2.5 Pro (RTX 4090) | Metric | RAG | No-RAG | Δ | |--------|-----|--------|---| | **Diagnosis Accuracy** | 90.00% | 89.55% | +0.45% | | **Diagnosis Type Accuracy** | 86.18% | 87.27% | -1.09% | | Multimodal Faithfulness | 19.05% | N/A | - | | Multimodal Relevance | 97.62% | N/A | - | | Context Relevance | 75.60% | N/A | - | **Run Folders**: - RAG: [gemini25prorag_soft_rerank_disabled_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25prorag_soft_rerank_disabled_test) - No-RAG: [gemini25pro_norag_20260201](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25pro_norag_20260201) **Configuration**: Type-Aware=disabled, Soft rerank, Q3 Router --- ### MedGemma 4B (RTX 3090) | Metric | RAG | No-RAG | Δ | |--------|-----|--------|---| | **Diagnosis Accuracy** | 83.18% | 82.27%* | +0.91% | | **Diagnosis Type Accuracy** | 77.00% | N/A | - | | Multimodal Faithfulness | 0.00% | N/A | - | | Multimodal Relevance | 59.52% | N/A | - | | Context Relevance | 73.21% | N/A | - | **Run Folders**: - RAG: [q3_router_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/q3_router_test) - No-RAG: [medgemma4b_norag_20260125](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/medgemma4b_norag_20260125)* (from previous report) **Configuration**: Q3 Router enabled, Soft Gate active > [!IMPORTANT] > **MedGemma now shows positive RAG effect (+0.91%)** with Q3 Router. > This contrasts with previous negative RAG effect (-5.91%) without router. --- ### Gemma3 4B (RTX 3090) | Metric | RAG | No-RAG | Δ | |--------|-----|--------|---| | **Diagnosis Accuracy** | 83.18% | 80.00% | +3.18% | | **Diagnosis Type Accuracy** | 75.45% | 72.18% | +3.27% | | Multimodal Faithfulness | 16.67% | 0.00% | - | | Multimodal Relevance | 69.05% | 0.00% | - | | Context Relevance | 70.83% | 0.00% | - | **Run Folders**: - RAG: [gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemma3_4b_multimodal_fixed_v2) - No-RAG: [no_rag_gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/no_rag_gemma3_4b_multimodal_fixed_v2) **Configuration**: FP16, Vision enabled, Soft Gate, Q3 Router --- ## Key Findings ### 1. Q3 Router Eliminates Negative RAG Effect | Model | Without Router | With Router | Improvement | |-------|---------------|-------------|-------------| | **MedGemma 4B** | -5.91% | +0.91% | +6.82% | | **Gemma3 4B** | N/A | +3.18% | - | | **Gemini 2.5 Pro** | N/A | +0.45% | - | The Q3 Router routes pure image queries (Q3_image_diagnosis) to No-RAG mode, preventing retrieval noise for visual-only cases. ### 2. Model Size vs RAG Benefit | Model | Parameters | Active Params | RAG Benefit | |-------|-----------|---------------|-------------| | Gemini 2.5 Pro | ~150B MoE | ~32B | +0.45% | | Gemma3 4B | 4B | 4B | +3.18% | | MedGemma 4B | 4B | 4B | +0.91% | > **Observation**: Smaller models (4B) benefit more from RAG (+0.91% to +3.18%) than large models (+0.45%). > Gemini 2.5 Pro's strong parametric knowledge (89.55% No-RAG) limits RAG gains. ### 3. Retrieval Quality Metrics | Run | nDCG@5 | MRR | |-----|--------|-----| | Gemma3 4B RAG | 0.2262 | 0.4292 | | MedGemma 4B RAG | 0.2262 | 0.4292 | Consistent retrieval quality across local models confirms that performance differences stem from model utilization, not retrieval variance. --- ## Run Catalog | Run ID | Model | Type | Date | Hardware | Notes | |--------|-------|------|------|----------|-------| | [gemini25prorag_soft_rerank_disabled_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25prorag_soft_rerank_disabled_test) | Gemini 2.5 Pro | RAG | 2026-02-01 | RTX 4090 | Type-Aware disabled, Soft rerank | | [gemini25pro_norag_20260201](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemini25pro_norag_20260201) | Gemini 2.5 Pro | No-RAG | 2026-02-01 | RTX 4090 | - | | [q3_router_test](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/q3_router_test) | MedGemma 4B | RAG | 2026-02-06 | RTX 3090 | Q3 Router | | [medgemma4b_norag_20260125](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/medgemma4b_norag_20260125) | MedGemma 4B | No-RAG | 2026-01-25 | RTX 3090 | From v1 report | | [gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/gemma3_4b_multimodal_fixed_v2) | Gemma3 4B | RAG | 2026-02-06 | RTX 3090 | Fixed multimodal | | [no_rag_gemma3_4b_multimodal_fixed_v2](file:///media/ngoc/mydisk/master-program/program-modules/master-thesis/Leishmania_v3/rag/runs/no_rag_gemma3_4b_multimodal_fixed_v2) | Gemma3 4B | No-RAG | 2026-02-06 | RTX 3090 | - | --- ## Configuration Details ### RAG Pipeline Settings - **Retrieval**: Hybrid (BM25 + Dense) - **Top-K**: 5 - **Rerank**: Soft rerank (not hard rerank) - **Q3 Router**: Enabled (routes Q3 → No-RAG) - **Type-Aware**: Disabled (for Gemini runs) - **Gate Threshold**: 0.0250 ### Model Loading - **Gemma3 4B**: FP16, Vision=True, 23.7GB VRAM - **MedGemma 4B**: FP16, Vision=False, google/medgemma-4b-it - **Gemini 2.5 Pro**: Cloud API, google.genai SDK --- ## Appendix: Comparison with Report v1 | Model | v1 RAG Effect | v2 RAG Effect | Change | |-------|--------------|---------------|--------| | Gemini 2.5 Pro | +30.45%* | +0.45% | Different baseline | | Gemma3 12B | +3.18% | N/A | - | | MedGemma 4B | -5.91% | +0.91% | **+6.82%** | | Gemma3 4B | N/A | +3.18% | New model | > [!WARNING] > **v1 vs v2 differences**: > - v1 used Gemma3 12B, v2 uses Gemma3 4B > - v1 Gemini comparison had different baseline (56.82% No-RAG vs 89.55%) > - v2 includes Q3 Router which routes image-only queries to No-RAG --- ## Source Files - Gemma3 4B RAG: `rtx3090/gemma3/gemma3_4B_multimodal_performance.txt` - Gemma3 4B No-RAG: `rtx3090/gemma3/gemma3_4B_no-rag_multimodal.txt` - MedGemma 4B RAG: `rtx3090/medgemma4b_q3_routers.txt` - Gemini RAG: `rtx4090/rag_disable_type_aware_soft_rerank.txt` - Gemini No-RAG: `rtx4090/no-rag_gemini25pro_q3_routers.txt` - Model Details: `model_details/perplexitypro.md`