Llama 4 - HackMD

## Llama 4 [Llama 4 doc](https://www.llama.com/docs/overview/) [Llama 4 blog](https://ai.meta.com/blog/llama-4-multimodal-intelligence/) | model | size | active params | expert number | total params | context length | | :---: | :---: | :---: | :---: | :---: | :---: | | Behemoth | | | | | | | Reasoning | | | | | | | Maverick | 788GB | 17B | 128 | 400B | 1M | | Scout | 210GB | 17B | 16 | 109B | 10M | Maverick and Scout are both **natively multimodal**, and with **Mixture of Expert architectures**. * Maverick: - parameters: 17B active, 128 experts, 400B total - performance: winner across several benchmarks over DeepSeek v3, Gemini 2.0 (flash), and GPT-4o (involution again) _The DeepSeek model here is **v3**, not the best one R1_ - efficiency: similar (or better) performance to DeepSeek v3 with half active parameters * Scout: - parameters: 17B active, 16 experts, 109B total - performance: outperforms Gemma 3, Gemini 2.0 Flash-Lite, Mistral 3.1 on benchmarks - context length: 10M About their multimodality: - achieved by pre-trained on text, image, and video - vision encoder: MetaCLIP-based - multiple input images supported License: [Llama 4](https://www.llama.com/llama4/license/) ### benchmarks: * Maverick benchmark ![Maverick instruct](https://resize-image.vocus.cc/resize?compression=6&norotation=true&url=https%3A%2F%2Fimages.vocus.cc%2F5feaae05-9736-40fc-b922-4a1fdebce58e.png&width=740&sign=XGw6h7jse30618iFEKs2Xe3-SqAkv2tYi1dKz5DnwVo) It's cost-effective and performant across several tasks ![Scout instruct](https://resize-image.vocus.cc/resize?compression=6&norotation=true&url=https%3A%2F%2Fimages.vocus.cc%2F53bf8e8b-689c-4bbb-9edb-396cda1e0eb7.png&width=740&sign=aAAKxHHI9275fLTfWpXTCttdVSipQO9XToOGPndjcf0) Scout got good grades too