# *Model: MedAlpaca 7B* Potential medical base model ## Model Evaluation Results | Metric | Value | |---|---| | **Total Questions** | 147,217 | | **Overall Accuracy** | 21.02% | ## Accuracy by Subject | Subject | Number of Samples | Accuracy | |---|---|---| | Dental | 8,175 | 24.37% | | Anatomy | 11,525 | 23.60% | | ENT | 4,053 | 23.56% | | Physiology | 7,057 | 23.44% | | Anaesthesia | 2,476 | 23.26% | | Unknown | 2,417 | 22.34% | | Orthopaedics | 2,469 | 21.91% | | Radiology | 3,527 | 21.86% | | Social & Preventive Medicine | 9,744 | 21.64% | | Ophthalmology | 5,588 | 21.64% | | Medicine | 14,029 | 21.16% | | Gynaecology & Obstetrics | 7,860 | 21.06% | | Surgery | 13,516 | 21.05% | | Forensic Medicine | 4,784 | 21.03% | | Pediatrics | 6,494 | 20.22% | --- # *Model: biomistral-benchmark* ## Model Evaluation Results | Metric | Value | |---|---| | **Total Questions** | 147,217 | | **Overall Accuracy** | 20.01% | ## Accuracy by Subject | Subject | Number of Samples | Accuracy | |---|---|---| | Anaesthesia | 2,476 | 23.75% | | Dental | 8,175 | 22.72% | | Gynaecology & Obstetrics | 7,860 | 22.49% | | Forensic Medicine | 4,784 | 22.20% | | ENT | 4,053 | 21.96% | | Anatomy | 11,525 | 21.61% | | Ophthalmology | 5,588 | 21.56% | | Surgery | 13,516 | 21.29% | | Social & Preventive Medicine | 9,744 | 21.10% | | Radiology | 3,527 | 21.04% | | Medicine | 14,029 | 21.03% | | Orthopaedics | 2,469 | 21.02% | | Unknown | 2,417 | 20.85% | | Pediatrics | 6,494 | 19.80% | | Physiology | 7,057 | 19.22% | --- # *Model: DeepSeek R1 0528 Qwen3 8B* Potential reasoning model (for agentic RAG and verification). Will have to be finetuned with medical context. ## Model Evaluation Results | Metric | Value | |---|---| | **Total Questions** | 147,217 | | **Overall Accuracy** | 5.64% | ## Accuracy by Subject | Subject | Number of Samples | Accuracy | |---|---|---| | Dental | 8,175 | 8.77% | | Forensic Medicine | 4,784 | 8.07% | | Ophthalmology | 5,588 | 7.36% | | Surgery | 13,516 | 6.81% | | Radiology | 3,527 | 6.78% | | ENT | 4,053 | 6.56% | | Orthopaedics | 2,469 | 6.44% | | Gynaecology & Obstetrics | 7,860 | 6.23% | | Pediatrics | 6,494 | 6.19% | | Skin | 1,400 | 5.93% | | Medicine | 14,029 | 5.81% | | Anaesthesia | 2,476 | 5.65% | | Unknown | 2,417 | 5.63% | | Pathology | 11,841 | 5.35% | | Social & Preventive Medicine | 9,744 | 5.31% | # *Model: Meditron* ## Model Evaluation Results | Metric | Value | |---|---| | **Total Questions** | 147,217 | | **Overall Accuracy** | 1.10% | ## Accuracy by Subject | Subject | Number of Samples | Accuracy | |---|---|---| | Pathology | 11,841 | 2.43% | | Medicine | 14,029 | 2.13% | | Microbiology | 9,119 | 1.30% | | Skin | 1,400 | 1.29% | | Surgery | 13,516 | 1.16% | | Pharmacology | 11,071 | 1.12% | | Anaesthesia | 2,476 | 1.09% | | Gynaecology & Obstetrics | 7,860 | 1.06% | | Pediatrics | 6,494 | 0.95% | | Unknown | 2,417 | 0.95% | | Psychiatry | 3,450 | 0.93% | | Anatomy | 11,525 | 0.89% | | Orthopaedics | 2,469 | 0.89% | | Ophthalmology | 5,588 | 0.81% | | Radiology | 3,527 | 0.74% | # *Model: BioMistral-7B-SLERP* ## Model Evaluation Results | Metric | Value | |-------------------|---------| | **Total Questions** | 5,572 | | **Overall Accuracy** | 16.82% | ## Accuracy by Subject | Subject | Number of Samples | Accuracy | |-----------------------|-------------------|-----------| | Dental | 251 | 21.91% | | Forensic Medicine | 178 | 21.91% | | Pediatrics | 251 | 19.92% | | ENT | 161 | 19.25% | | Social & Preventive Medicine | 401 | 18.95% | | Medicine | 543 | 18.60% | | Anatomy | 428 | 18.45% | | Gynaecology & Obstetrics | 317 | 18.29% | | Anaesthesia | 114 | 17.54% | | Surgery | 497 | 16.70% | | Unknown | 68 | 16.17% | | Microbiology | 338 | 15.97% | | Pathology | 458 | 15.50% | | Radiology | 127 | 14.96% | | Biochemistry | 266 | 14.28% |