## Llama 4
[Llama 4 doc](https://www.llama.com/docs/overview/)
[Llama 4 blog](https://ai.meta.com/blog/llama-4-multimodal-intelligence/)
| model | size | active params | expert number | total params | context length |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Behemoth | | | | | |
| Reasoning | | | | | |
| Maverick | 788GB | 17B | 128 | 400B | 1M |
| Scout | 210GB | 17B | 16 | 109B | 10M |
Maverick and Scout are both **natively multimodal**, and with **Mixture of Expert architectures**.
* Maverick:
- parameters: 17B active, 128 experts, 400B total
- performance: winner across several benchmarks over DeepSeek v3, Gemini 2.0 (flash), and GPT-4o (involution again)
_The DeepSeek model here is **v3**, not the best one R1_
- efficiency: similar (or better) performance to DeepSeek v3 with half active parameters
* Scout:
- parameters: 17B active, 16 experts, 109B total
- performance: outperforms Gemma 3, Gemini 2.0 Flash-Lite, Mistral 3.1 on benchmarks
- context length: 10M
About their multimodality:
- achieved by pre-trained on text, image, and video
- vision encoder: MetaCLIP-based
- multiple input images supported
License: [Llama 4](https://www.llama.com/llama4/license/)
### benchmarks:
* Maverick benchmark

It's cost-effective and performant across several tasks

Scout got good grades too