LMs technology selection

### LLM and SLM | Model | parameter number | owner | context length | reasoning | licence | language | limitation | distilled/finetuned model | MoE | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | [GLM 4.5](https://huggingface.co/zai-org/GLM-4.5) | 358B | [Zai org](https://huggingface.co/zai-org) | 128k | :white_check_mark: | MIT | multiple | unknown | community models | :white_check_mark: | | [GLM 4.5 air](https://huggingface.co/zai-org/GLM-4.5-Air) | 110B | [Zai org](https://huggingface.co/zai-org) | 128k | :white_check_mark: | MIT | multiple | unknown | community models | :white_check_mark: | | [Kimi-K2](https://huggingface.co/moonshotai/Kimi-K2-Instruct) | 1T | [Moonshot](https://huggingface.co/moonshotai) | 128k | :white_check_mark: | modified MIT | multiple | unknown | instructed, and has lots of community derivatives | :white_check_mark: | | [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) | 685B | [DeepSeek](https://huggingface.co/deepseek-ai) | 128k | :white_check_mark: | MIT | multiple | `history` and `politics` censorship, tendency to use `simplified chinese` | [Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B), [Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B), [r1-1776](https://huggingface.co/perplexity-ai/r1-1776) | :white_check_mark: | | [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) | 685B | [DeepSeek](https://huggingface.co/deepseek-ai) | 128k | :white_check_mark: | DeepSeek | multiple | same as above | DeepSeek R1 | :white_check_mark: | | [aya-101](https://huggingface.co/CohereForAI/aya-101) | 12.9B | [Cohere](https://huggingface.co/CohereForAI) | 1024 | :x: | Apache-2.0 | multiple (101) | - | - | | aya-23/expanse | 8 \| 23 \| 35B | [Cohere](https://huggingface.co/CohereForAI) | 8192 | :x: | CC-BY-NC-4.0 | multiple (23↑) | - | aya-23, aya-expanse family | :x: | | [C4AI Command R](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024) | 8 \| 23 \| 55\| 104B | [Cohere](https://huggingface.co/CohereForAI) | 128k | :x: | CC-BY-NC-4.0 | multiple (23↑) | - | C4AI Command R family | :x: | | [C4AI Command A](https://huggingface.co/CohereLabs/c4ai-command-a-03-2025) | 111B | [CohereLabs](https://huggingface.co/CohereLabs) | 128k | :x: | CC-BY-NC-4.0 | multiple | - | [Command A vision](https://huggingface.co/CohereLabs/command-a-vision-07-2025) | :x: | | Qwen2.5 | 1.8 \| 7 \| 14 \| 72B | [Alibaba](https://huggingface.co/Qwen) | 128k | :x: | Qwen or Apache-2.0 | multiple (29) | PC | Qwen family | some | | Qwen3 | 30 \| 32 \| 235B | [Alibaba](https://huggingface.co/Qwen) | 128k | :x: | Qwen or Apache-2.0 | multiple | PC | Qwen family | :white_check_mark: | | [Qwen QwQ](https://huggingface.co/Qwen/QwQ-32B-Preview) | 32.8B | [Alibaba](https://huggingface.co/Qwen) | 32768 | :white_check_mark: | Apache-2.0 | multiple | bad reasoning ability, _`harmony`_, wild hallucination | - | :x: | | Mistral | 7.28B | [Mistral AI](https://huggingface.co/mistralai) | 8192 | :x: | Apache-2.0^\*^ | multiple | PC | Pixtral family, [ministral 8B](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410), [Mathstral 7B](https://huggingface.co/mistralai/Mathstral-7B-v0.1), [HF Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), [Mixtral family (MoE)](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), [MediaTek Breeze family](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1), etc. (more than 840) | Mixtral | | [Mistral small](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) | 24B | [Mistral AI](https://huggingface.co/mistralai) | 128k | :x: | Apache-2.0 | multiple | mild censorship | [Devstral](https://huggingface.co/mistralai/Devstral-Small-2505), [Magistral small](https://huggingface.co/mistralai/Magistral-Small-2506) | :x: | | [Magistral small](https://huggingface.co/mistralai/Magistral-Small-2506) | 24B | [Mistral AI](https://huggingface.co/mistralai) | 128k | :white_check_mark: | Apache-2.0 | multiple | \- | Already Fine-tuned | :x: | | [Devstral](https://huggingface.co/mistralai/Devstral-Small-2505) (SOTA) | 24B | [Mistral AI](https://huggingface.co/mistralai) | 128k | :x: | Apache-2.0 | multiple | - | Already Fine-tuned | :x: | | [Codestral Mamba](https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1) | 7.29B | [Mistral AI](https://huggingface.co/mistralai) | 256k | :x: | Apache-2.0 | multiple | - | - | :x: | | [Codestral 22B](https://huggingface.co/mistralai/Codestral-22B-v0.1) | 22.2B | [Mistral AI](https://huggingface.co/mistralai) | 32k | :x: | MNPL | multiple | - | - | :x: | | [Voxtral](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) | 4.6B \| 24B | [Mistral AI](https://huggingface.co/mistralai) | 32k | :x: | Apache-2.0 | multiple | - | fine-tuned from ministral and mistral small | :x: | | [Mistral-Nemo](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | 12.2B | [Mistral AI](https://huggingface.co/mistralai) & [NVIDIA](https://huggingface.co/nvidia) | 128k | :white_check_mark: | Apache-2.0 | multiple | hallucination | - | :x: | | [Llama-3.1-Nemotron](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) | 49\| 51 \| 70.6 \| 253B | [NVIDIA](https://huggingface.co/nvidia) | 128k | :white_check_mark: | CC-BY-4.0 | :question: | :question: | finetuned from Qwen, but not instructed yet | :x: | | [OpenMath Nemotron](https://huggingface.co/nvidia/OpenMath-Nemotron-32B) | 1.5\| 7 \| 14 \| 32B | [NVIDIA](https://huggingface.co/nvidia) | 131k | :question: | Llama 3.1 | multiple | PC | Already finetuned | :x: | | [Llama-3](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | 1-70-406B | [Meta](https://huggingface.co/meta-llama) | 128k | :x: | Llama 3.3, [nv-open model](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) | multiple | PC | countless, including llama family | :x: | | [Llama-4](https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164) | 109 \| 402B | [Meta](https://huggingface.co/meta-llama) | 1M \| 10M | :x: | Llama 4 | multiple | unknown | Llama family, including [Maverick](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E) and [Scout](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Original) | :white_check_mark: | | [Llama-Breeze2](https://huggingface.co/MediaTek-Research/Llama-Breeze2-8B-Instruct) | 3B \| 8B | [MediaTek](https://huggingface.co/MediaTek-Research) | 128k | :x: | Llama 3.2 | multiple | ? | Breeze2 family | :x: | | [gpt-oss](https://huggingface.co/openai/gpt-oss-120b) | 20B \| 120B | [OpenAI](https://huggingface.co/openai) | 128k | :white_check_mark: | Apache-2.0 | multiple | ? | not yet | :white_check_mark: | | [LongCat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat) | 562B | [Meituan](https://huggingface.co/meituan-longcat) | 128k | :x: | MIT | multiple | ? | not yet | :star: | | [ERNIE 4.5](https://huggingface.co/baidu/ERNIE-4.5-VL-424B-A47B-Base-Paddle) | 424B \| 300B \| 28B \| 21B \| 0.3B | [Baidu](https://huggingface.co/baidu) | 128k | :white_check_mark: | Apache-2.0 (but said all right reserved???) | multiple | ? | [Ernie family](https://huggingface.co/collections/baidu/ernie-45-6861cd4c9be84540645f35c9) | :white_check_mark: | | [Granite 3.3](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) | 2 \| 8B | [IBM](https://huggingface.co/ibm-granite) | 128k | :white_check_mark: | Apache-2.0 | multiple | - | [speech](https://huggingface.co/ibm-granite/granite-speech-3.3-2b), [RAG agent](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib), [etc.](https://huggingface.co/ibm-granite/models) | :white_check_mark: | | [Granite 4 tiny preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | 7B | [IBM](https://huggingface.co/ibm-granite) | 128k | :white_check_mark: | Apache-2.0 | multiple | - | [official instruct](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | :white_check_mark: | | Falcon | 7 \| 40B | [TII](https://huggingface.co/tiiuae) | 2048 | :x: | Apache-2.0 | multiple | - | Falcon family^\*^ | :x: | | Falcon3 | 1 \| 3 \| 7 \| 10B | [TII](https://huggingface.co/tiiuae) | 32k | :x: | Falcon 3 TII | multiple | - | Falcon3 family | :x: | | [Falcon H1](https://huggingface.co/tiiuae/Falcon-H1-34B-Instruct) | 0.5 \| 1.5 \| 3 \| 7 \| 34B | [TII](https://huggingface.co/tiiuae) | 256k | :white_check_mark: | [falcon-llm-licence](https://falconllm.tii.ae/falcon-terms-and-conditions.html) | multiple | - | [Falcon H1 family](https://huggingface.co/collections/tiiuae/falcon-h1-6819f2795bc406da60fab8df) | :x: | | [Flan t5](https://huggingface.co/google/flan-t5-base) | 77M \| 248M \| 783M \| 2.85B \| 11.3B | [Google](https://huggingface.co/google) | 2k in \| 512 out | :x: | Apache-2.0 | multiple | unknown | flan family | :x: | | Gemma 2 | 2 \| 9 \| 27B | [Google](https://huggingface.co/google) | 8192 | :x: | Gemma | multiple (best in Eng) | - | gemma family | :x: | | Gemma 3 | 1 \| 4 \| 12 \| 27B | [Google](https://huggingface.co/google) | 128k | :x: | Gemma | multiple | - | gemma family | :x: | | Phi-3 | 3.8 \| 7.4 \| 14B | [Microsoft](https://huggingface.co/microsoft) | 4k or 128k | :x: | MIT | multiple | - | Phi-3 family | :x: | | Phi-3.5 | 3.82 \| 4.15 \| 41.9B | [Microsoft](https://huggingface.co/microsoft) | 128k | :x: | MIT | multiple | - | [Phi-3.5 mini](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), [Phi-3.5 vision](https://huggingface.co/microsoft/Phi-3.5-vision-instruct), [Phi-3.5 MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) | Phi-3.5-MoE | | [Phi-4](https://huggingface.co/microsoft/phi-4) | 3.84B \| 14.7B | [Microsoft](https://huggingface.co/microsoft) | 16k | :x: | MIT | multiple | PC | [Phi 4 family (e.g. reasoning)](https://huggingface.co/microsoft/Phi-4-reasoning-plus) | Phi-4 MoE | | [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) | 135M \| 360M \| 1.7B | [Hugging Face TB](https://huggingface.co/HuggingFaceTB) | 8192 | :x: | Apache-2.0 | 5 | - | SmolLM2 family | :x: | | [SmolLM3](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) | 3B | [Hugging Face TB](https://huggingface.co/HuggingFaceTB) | 128k | :white_check_mark: | Apache-2.0 | multiple | - | [SmolLM3 instruct (reasoning)](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | :x: | | [SmolVLM](https://huggingface.co/HuggingFaceTB/SmolVLM-Base) | 256M \| 500M \| 2.25B | [Hugging Face TB](https://huggingface.co/HuggingFaceTB) | 8192 | :x: | Apache-2.0 | unknown | - | SmolVLM family | :x: | | [minGPT](https://huggingface.co/Katiyar48/MinGPT) | 1.4 \| 6.8B | [Katiyar48](https://huggingface.co/Katiyar48/MinGPT) | 1536 \| 3072 | :x: | MIT | multiple | unknown | unknown | :x: | | [MiniMind](https://huggingface.co/jingyaogong/MiniMind2) | 29M \| 109M | [jingyaogong](https://huggingface.co/jingyaogong) | ? | :x: | Apache-2.0 | multiple | hallucination | [MiniMind family](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5) | some | | StableLM | 1.6 \| 3 \| 12B | [Stability AI](https://huggingface.co/stabilityai) | 4096 | :x: | Stability AI | multiple | unknown | StableLM family, [StableLM Zephyr](https://huggingface.co/stabilityai/stablelm-zephyr-3b) | :x: | | stable-code | 3B | [Stability AI](https://huggingface.co/stabilityai) | 100k | :x: | Stability AI | multiple | unknown | [Stable Code Instruct](https://huggingface.co/stabilityai/stable-code-instruct-3b) | :x: | | LFM2 | 350M \| 700M \| 1.2B | [Liquid AI](https://huggingface.co/LiquidAI) | 32k | :x: | LFM open license 1.0 | multiple | unknown | - | :x: | | ~~grok-1~~ | ~~314B~~ | ~~[xAI](https://huggingface.co/xai-org)~~ | ~~8192~~ | :x: | ~~Apache-2.0~~ | ~~multiple~~ | ~~unknown~~ | ~~-~~ | :x: | * Kimi used modified MIT licence, requiring attribution if users or revenue higher than threshold * mistral small and large use MRL (Mistral AI Research Licence) * PC: political correctness * MNPL: Mistral Non-Production Licence * falcon 180B use Falcon-180B TII licence * :star: means dynamic MoE * Reference: * [LLM leader board](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) * [cn LLM leader board](https://huggingface.co/spaces/BAAI/open_cn_llm_leaderboard) * [mixtral paper](https://arxiv.org/pdf/2401.04088) * [SMoE](https://arxiv.org/pdf/2209.01667)  --- ### Licence | licence | commercial use | limitation | attribution | :---: | :---: | --- | --- | | MIT | :white_check_mark: | | copyright notice | | Apache-2.0 | :white_check_mark: | | copyright, patent, trademark notices | | CC(-BY-NC-)4.0 | :x: | NonCommercial | identify creator, copyright notice, licence notice | | Llama 3 | 700 millions↓ MAU | Llama3-based LLM is forbidden^1^, litigation^2^ | in product name and notice^3^ documentation^4^ | | Qwen | :white_check_mark: | 100 million monthly active users require a licence | Qwen licence agreement, copyright notice | | DeepSeek | :white_check_mark: | Use-based restrictions | copyright, patent, trademark notices | | Falcon 3 TII | :white_check_mark: | Acceptable Use Policy | TII Falcon Licence, copyright notice, 'Built using artificial intelligence technology from the Technology Innovation Institute' | | MNPL | :x: | testing, research, Personal, or evaluation purposes in Non-Production Environments | Mistral AI Non-Production Licence | | MRL | :x: | Research Purposes | Mistral AI Research Licence | | Gemma | :white_check_mark: | Prohibited Use Policy | Gemma Terms of Use | | Stability AI | \$1M in annual revenue↓ | AUP | Stability AI Community Licence, Powered by Stability AI | | LFM Open Licence v1.0 | \$10M in annual revenue↓ | | Copy of the Licence, notices for modification, copyright, patent, trademark, and attribution notices | _MAU: monthly active users_ 1. except for Llama 3 derived LLM 2. If you sued Meta their Llama 3 violating your copyright, all rights Llama 3 authorised will be terminated 3. Meta Llama 3 is licenced under the Meta Llama 3 Community Licence, Copyright © Meta Platforms, Inc. All Rights Reserved. 4. Llama 3 requires `Built with Meta Llama 3` * MNPL: Mistral Non-Production Licence * MRL: Mistral AI Research Licence #### Detailed description: * **MIT**: The copyright notice and permission notice shall be included in all copies or substantial portions of the Software. * **Apache 2.0**: Requires retaining copyright, patent, trademark, and attribution notices. * **CC-BY-NC 4.0**: Commercial use is not permitted. Requires attribution to the creator and a copyright notice. * **Llama 3**: If the monthly active users of a product or service exceed 700 million, a licence from Meta is required. You are prohibited from creating a Llama 3-based LLM and from suing Meta for intellectual property infringement related to Llama 3. Attribution requires 'Built with Llama' to be displayed. * **Qwen**: Commercial use is permitted, but if monthly active users exceed 100 million, a licence is required. Requires including the Qwen licence agreement and copyright notice. * **DeepSeek**: Includes use-based restrictions. Requires retaining copyright, patent, and trademark notices. * **Falcon 3 TII**: Subject to the Acceptable Use Policy. Requires the TII Falcon Licence, copyright notice, and a statement that the work is built using artificial intelligence technology from the Technology Innovation Institute. * **MNPL**: Only for testing, research, personal, or evaluation purposes in Non-Production Environments. Requires the Mistral AI Non-Production Licence. * **MRL**: Limited to Research Purposes. Requires the Mistral AI Research Licence. * **Gemma**: Subject to the Prohibited Use Policy. Requires including the Gemma Terms of Use. * **Stability AI**: If annual revenue exceeds $1M, a licence is required. Must comply with the AUP. Requires the Stability AI Community Licence and 'Powered by Stability AI'. _AUP: [Stability AI Acceptable Use Policy](https://stability.ai/use-policy)_ * **LFM 1.0**: If annual revenue exceeds $10M, the usage is not licensed (except for Qualified NPO's research). You shall add: Copy of the LFM Open Licence, prominent notices for modified files, retain copyright, patent, trademark, and attribution notices from Source form, and include NOTICE file if applicable