# *Model: MedAlpaca 7B*
Potential medical base model
## Model Evaluation Results
| Metric | Value |
|---|---|
| **Total Questions** | 147,217 |
| **Overall Accuracy** | 21.02% |
## Accuracy by Subject
| Subject | Number of Samples | Accuracy |
|---|---|---|
| Dental | 8,175 | 24.37% |
| Anatomy | 11,525 | 23.60% |
| ENT | 4,053 | 23.56% |
| Physiology | 7,057 | 23.44% |
| Anaesthesia | 2,476 | 23.26% |
| Unknown | 2,417 | 22.34% |
| Orthopaedics | 2,469 | 21.91% |
| Radiology | 3,527 | 21.86% |
| Social & Preventive Medicine | 9,744 | 21.64% |
| Ophthalmology | 5,588 | 21.64% |
| Medicine | 14,029 | 21.16% |
| Gynaecology & Obstetrics | 7,860 | 21.06% |
| Surgery | 13,516 | 21.05% |
| Forensic Medicine | 4,784 | 21.03% |
| Pediatrics | 6,494 | 20.22% |
---
# *Model: biomistral-benchmark*
## Model Evaluation Results
| Metric | Value |
|---|---|
| **Total Questions** | 147,217 |
| **Overall Accuracy** | 20.01% |
## Accuracy by Subject
| Subject | Number of Samples | Accuracy |
|---|---|---|
| Anaesthesia | 2,476 | 23.75% |
| Dental | 8,175 | 22.72% |
| Gynaecology & Obstetrics | 7,860 | 22.49% |
| Forensic Medicine | 4,784 | 22.20% |
| ENT | 4,053 | 21.96% |
| Anatomy | 11,525 | 21.61% |
| Ophthalmology | 5,588 | 21.56% |
| Surgery | 13,516 | 21.29% |
| Social & Preventive Medicine | 9,744 | 21.10% |
| Radiology | 3,527 | 21.04% |
| Medicine | 14,029 | 21.03% |
| Orthopaedics | 2,469 | 21.02% |
| Unknown | 2,417 | 20.85% |
| Pediatrics | 6,494 | 19.80% |
| Physiology | 7,057 | 19.22% |
---
# *Model: DeepSeek R1 0528 Qwen3 8B*
Potential reasoning model (for agentic RAG and verification). Will have to be finetuned with medical context.
## Model Evaluation Results
| Metric | Value |
|---|---|
| **Total Questions** | 147,217 |
| **Overall Accuracy** | 5.64% |
## Accuracy by Subject
| Subject | Number of Samples | Accuracy |
|---|---|---|
| Dental | 8,175 | 8.77% |
| Forensic Medicine | 4,784 | 8.07% |
| Ophthalmology | 5,588 | 7.36% |
| Surgery | 13,516 | 6.81% |
| Radiology | 3,527 | 6.78% |
| ENT | 4,053 | 6.56% |
| Orthopaedics | 2,469 | 6.44% |
| Gynaecology & Obstetrics | 7,860 | 6.23% |
| Pediatrics | 6,494 | 6.19% |
| Skin | 1,400 | 5.93% |
| Medicine | 14,029 | 5.81% |
| Anaesthesia | 2,476 | 5.65% |
| Unknown | 2,417 | 5.63% |
| Pathology | 11,841 | 5.35% |
| Social & Preventive Medicine | 9,744 | 5.31% |
# *Model: Meditron*
## Model Evaluation Results
| Metric | Value |
|---|---|
| **Total Questions** | 147,217 |
| **Overall Accuracy** | 1.10% |
## Accuracy by Subject
| Subject | Number of Samples | Accuracy |
|---|---|---|
| Pathology | 11,841 | 2.43% |
| Medicine | 14,029 | 2.13% |
| Microbiology | 9,119 | 1.30% |
| Skin | 1,400 | 1.29% |
| Surgery | 13,516 | 1.16% |
| Pharmacology | 11,071 | 1.12% |
| Anaesthesia | 2,476 | 1.09% |
| Gynaecology & Obstetrics | 7,860 | 1.06% |
| Pediatrics | 6,494 | 0.95% |
| Unknown | 2,417 | 0.95% |
| Psychiatry | 3,450 | 0.93% |
| Anatomy | 11,525 | 0.89% |
| Orthopaedics | 2,469 | 0.89% |
| Ophthalmology | 5,588 | 0.81% |
| Radiology | 3,527 | 0.74% |
# *Model: BioMistral-7B-SLERP*
## Model Evaluation Results
| Metric | Value |
|-------------------|---------|
| **Total Questions** | 5,572 |
| **Overall Accuracy** | 16.82% |
## Accuracy by Subject
| Subject | Number of Samples | Accuracy |
|-----------------------|-------------------|-----------|
| Dental | 251 | 21.91% |
| Forensic Medicine | 178 | 21.91% |
| Pediatrics | 251 | 19.92% |
| ENT | 161 | 19.25% |
| Social & Preventive Medicine | 401 | 18.95% |
| Medicine | 543 | 18.60% |
| Anatomy | 428 | 18.45% |
| Gynaecology & Obstetrics | 317 | 18.29% |
| Anaesthesia | 114 | 17.54% |
| Surgery | 497 | 16.70% |
| Unknown | 68 | 16.17% |
| Microbiology | 338 | 15.97% |
| Pathology | 458 | 15.50% |
| Radiology | 127 | 14.96% |
| Biochemistry | 266 | 14.28% |