## Llama 4 [Llama 4 doc](https://www.llama.com/docs/overview/) [Llama 4 blog](https://ai.meta.com/blog/llama-4-multimodal-intelligence/) | model | size | active params | expert number | total params | context length | | :---: | :---: | :---: | :---: | :---: | :---: | | Behemoth | | | | | | | Reasoning | | | | | | | Maverick | 788GB | 17B | 128 | 400B | 1M | | Scout | 210GB | 17B | 16 | 109B | 10M | Maverick and Scout are both **natively multimodal**, and with **Mixture of Expert architectures**. * Maverick: - parameters: 17B active, 128 experts, 400B total - performance: winner across several benchmarks over DeepSeek v3, Gemini 2.0 (flash), and GPT-4o (involution again) _The DeepSeek model here is **v3**, not the best one R1_ - efficiency: similar (or better) performance to DeepSeek v3 with half active parameters * Scout: - parameters: 17B active, 16 experts, 109B total - performance: outperforms Gemma 3, Gemini 2.0 Flash-Lite, Mistral 3.1 on benchmarks - context length: 10M About their multimodality: - achieved by pre-trained on text, image, and video - vision encoder: MetaCLIP-based - multiple input images supported License: [Llama 4](https://www.llama.com/llama4/license/) ### benchmarks: * Maverick benchmark  It's cost-effective and performant across several tasks  Scout got good grades too
×
Sign in
Email
Password
Forgot password
or
Sign in via Google
Sign in via Facebook
Sign in via X(Twitter)
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
Continue with a different method
New to HackMD?
Sign up
By signing in, you agree to our
terms of service
.