# Model List
## For 4GB RAM (~2GB for 1B/ ~4GB for 3B)
1. [Tiny LLama Chat 1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6)
```json=
{
"source_url": "https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6/blob/main/ggml-model-q4_0.gguf",
"id": "tinyllama-1.1b",
"object": "model",
"name": "TinyLlama Chat 1.1B",
"version": "1",
"description": "The TinyLlama project, featuring a 1.1B parameter Llama model, is pretrained on an expansive 3 trillion token dataset. Its design ensures easy integration with various Llama-based open-source projects. Despite its smaller size, it efficiently utilizes lower computational and memory resources, drawing on GPT-4's analytical prowess to enhance its conversational abilities and versatility.",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "TinyLlama",
"tags": ["General Use"],
"size": "637000000"
},
}
```
- **Prompt Template:**
```markdown=
<|system|>
You are a friendly chatbot who always responds in the style of a pirate.</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
...
```
2. [Deepseek Coder 1.3B](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base)
```json=
{
"source_url": "https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GGUF/blob/main/deepseek-coder-1.3b-base.Q4_K_M.gguf",
"id": "deepseek-coder-1.3b",
"object": "model",
"name": "Deepseek Coder 1.3B",
"version": "1",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "deepseek, The Bloke",
"tags": ["Code"],
"size": "870000000"
},
}
```
## For 3B models they are not really suitable for 4GB RAM (even Q2)
---
3. [Rocket 3B](https://huggingface.co/pansophic/rocket-3B)
```json=
{
"source_url": "https://huggingface.co/TheBloke/rocket-3B-GGUF/resolve/main/rocket-3b.Q4_K_M.gguf?download=true",
"id": "rocket-3b",
"object": "model",
"name": "Rocket 3B",
"version": "1",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "pansophic, The Bloke",
"tags": ["General Use"],
"size": "1710000000"
},
}
```
- **Prompt Template:**
```json=
<|im_start|>system
System message here.<|im_end|>
<|im_start|>user
Your message here!<|im_end|>
<|im_start|>assistant
...
```
4. [Marx 3B](https://huggingface.co/acrastt/Marx-3B-V3)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Marx-3B-v3-GGUF/blob/main/marx-3b-v3.Q4_K_M.gguf",
"id": "marx-3b",
"object": "model",
"name": "Marx 3B",
"version": "1",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "acrastt, The Bloke",
"tags": ["General Use"],
"size": "1620000000"
},
}
```
- **Prompt Template:**
```json=
### HUMAN:
{prompt}
### RESPONSE:
...
```
5. [IS LM 3B](https://huggingface.co/acrastt/IS-LM-3B)
```json=
{
"source_url": "https://huggingface.co/UmbrellaCorp/IS-LM-3B_GGUF/blob/main/IS-LM-Q4_K_M.gguf",
"id": "islm-3b",
"object": "model",
"name": "IS LM 3B",
"version": "1",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "UmbrellaCorp, The Bloke",
"tags": ["General Use", "Economics"],
"size": "1710000000"
},
}
```
- **Prompt Template:**
```json=
USER: {prompt}
ASSISTANT:
```
---
## For 8GB RAM (~7GB RAM)
1. [Mistral OpenHermes 7B](https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF)
```json
{
"source_url": "https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/blob/main/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
"id": "openhermes-2.5-mistral-7b",
"object": "model",
"name": "Openhermes 2.5 Mistral 7B",
"version": "2.5",
"description": "The Teknium-developed OpenHermes 2.5 Mistral 7B incorporates additional code datasets, more than a million GPT-4 generated data examples, and other high-quality open datasets. This enhancement led to significant improvement in benchmarks, highlighting its improved skill in handling code-centric tasks.",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "Teknium, The Bloke",
"tags": ["General", "Roleplay"],
"size": "4370000000"
}
}
```
- **Prompt template: ChatML**
```
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
2. [Neural Chat 7B](https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF)
```json
{
"source_url": "https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF/blob/main/neural-chat-7b-v3-1.Q4_K_M.gguf",
"id": "neural-chat-7b-v3-1",
"object": "model",
"name": "Neural Chat 7B",
"version": "3.1",
"description": "The Neural Chat 7B model, developed on the foundation of mistralai/Mistral-7B-v0.1, has been fine-tuned using the Open-Orca/SlimOrca dataset and aligned with the Direct Preference Optimization (DPO) algorithm. It has demonstrated substantial improvements in various AI tasks and performance well on the open_llm_leaderboard.",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100",
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "Intel, The Bloke",
"tags": ["General Use", "Role-playing"],
"size": "4370000000"
}
}
```
- **Prompt Template**
```markdown
### System:
{system}
### User:
{usr}
### Assistant:
```
3. [Open Chat 3.5 7B](https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF)
```json
{
"source_url": "https://huggingface.co/TheBloke/openchat_3.5-GGUF/blob/main/openchat_3.5.Q4_K_M.gguf",
"id": "openchat_3.5",
"object": "model",
"name": "Open Chat 3.5 7B",
"version": "1.0",
"description": "OpenChat represents a breakthrough in the realm of open-source language models. By implementing the C-RLFT fine-tuning strategy, inspired by offline reinforcement learning, this 7B model achieves results on par with ChatGPT (March).",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "OpenChat, The Bloke",
"tags": ["General", "Code"],
"size": "4370000000"
}
}
```
- **Prompt Template: OpenChat**
The GPT4 template is also available as the integrated tokenizer.chat_template
```json=
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "How are you today?"}
]
```
```json=
GPT4 User: {prompt}<|end_of_turn|>
GPT4 Assistant:
```
4. [Zephyr Beta 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
```json
{
"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
"id": "zephyr-7b-beta",
"object": "model",
"name": "Zephyr Beta 7B",
"version": "1.0",
"description": "The Zephyr-7B-β model marks the second iteration in the Zephyr series, designed to function as an effective assistant. It has been fine-tuned from the mistralai/Mistral-7B-v0.1 base model, utilizing a combination of public and synthetic datasets with the application of Direct Preference Optimization.",
"format": "gguf",
"settings": {
"ctx_len": "2048",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "2048",
"stream": "true"
},
"metadata": {
"author": "HuggingFaceH4, The Bloke",
"tags": ["General Use"],
"size": "4370000000"
}
}
```
- **Prompt template: Zephyr**
```json=
<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>
```
5. [Open Orca 7B](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
```json
{
"source_url": "https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/blob/main/mistral-7b-openorca.Q4_K_M.gguf",
"id": "openorca-7b",
"object": "model",
"name": "OpenOrca 7B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "8192",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "8192",
"stream": "true"
},
"metadata": {
"author": "OpenOrca, The Bloke",
"tags": ["General", "Code"],
"size": "4370000000"
}
}
```
- **Prompt template: ChatML**
```json=
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
6. [Starling 7B alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/blob/main/starling-lm-7b-alpha.Q4_K_M.gguf",
"id": "strarling-7b",
"object": "model",
"name": "Strarling alpha 7B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "8192",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "8192",
"stream": "true"
},
"metadata": {
"author": "Berkeley-nest, The Bloke",
"tags": ["General", "Code"],
"size": "4370000000"
}
}
```
## For 16 GB
1. [Wizard Coder Python 13B](https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0)
```json=
{
"source_url": "https://huggingface.co/TheBloke/WizardCoder-Python-13B-V1.0-GGUF/blob/main/wizardcoder-python-13b-v1.0.Q5_K_M.gguf",
"id": "wizard-coder-13b",
"object": "model",
"name": "Wizard Coder Python 13B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "WizardLM, The Bloke",
"tags": ["Code"],
"size": "9230000000"
}
}
```
- **Prompt Template:**
```json=
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
```
2. [Tiefighter 13B](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter)
```json=
{
"source_url": "https://huggingface.co/TheBloke/LLaMA2-13B-Tiefighter-GGUF/blob/main/llama2-13b-tiefighter.Q5_K_M.gguf",
"id": "tiefighter-13b",
"object": "model",
"name": "Tiefighter 13B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "KoboldAI, The Bloke",
"tags": ["General Use", "Role-playing"],
"size": "9230000000"
}
}
```
- **Prompt template:**
```json=
```
3. [MythoMax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b)
```json=
{
"source_url": "https://huggingface.co/TheBloke/LLaMA2-13B-Tiefighter-GGUF/blob/main/llama2-13b-tiefighter.Q5_K_M.gguf",
"id": "tiefighter-13b",
"object": "model",
"name": "Tiefighter 13B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "Gryphe, The Bloke",
"tags": ["Role-playing"],
"size": "9230000000"
}
}
```
- **Prompt Template: Custom**
```json=
<System prompt/Character Card>
### Instruction:
Your instruction or question here.
For roleplay purposes, I suggest the following - Write <CHAR NAME>'s next reply in a chat between <YOUR NAME> and <CHAR NAME>. Write a single reply only.
### Response:
```
4. [Orca 2 13B](https://huggingface.co/microsoft/Orca-2-13b)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Orca-2-13B-GGUF/blob/main/orca-2-13b.Q5_K_M.gguf",
"id": "orca-13b",
"object": "model",
"name": "Orca 2 13B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "Gryphe, The Bloke",
"tags": ["Role-playing"],
"size": "9230000000"
}
}
```
- **Prompt Template: ChatML**
```json=
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
5. [Noromaid 20B](https://huggingface.co/NeverSleep/Noromaid-20b-v0.1.1) (~15GB RAM)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Noromaid-20B-v0.1.1-GGUF/blob/main/noromaid-20b-v0.1.1.Q4_K_M.gguf",
"id": "noromaid-20b",
"object": "model",
"name": "Noromaid 20B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "NeverSleep, The Bloke",
"tags": ["Role-playing"],
"size": "12040000000"
}
}
```
- **Prompt Template: Custom**
```json=
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
```
## For 32 GB (~26GB RAM)
1. [Yi 34B](https://huggingface.co/01-ai/Yi-34B)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Yi-34B-Chat-GGUF/blob/main/yi-34b-chat.Q5_K_M.gguf",
"id": "yi-34b",
"object": "model",
"name": "Yi 34B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "8192",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "8192",
"stream": "true"
},
"metadata": {
"author": "01-ai, The Bloke",
"tags": ["General"],
"size": "24320000000"
}
}
```
- **Prompt Template:**
```json=
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
2. [Capybara 200k 34B](https://huggingface.co/NousResearch/Nous-Capybara-34B)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF/blob/main/nous-capybara-34b.Q5_K_M.gguf",
"id": "capybara-34b",
"object": "model",
"name": "Capybara 34B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "200000",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "200000",
"stream": "true"
},
"metadata": {
"author": "NousResearch, The Bloke",
"tags": ["General", "Big Context Length"],
"size": "24320000000"
}
}
```
- **Prompt Template:**
```json=
USER: {prompt}
ASSISTANT:
```
3. [Phind Codellama 34B](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
```json=
{
"source_url": "https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF/blob/main/phind-codellama-34b-v2.Q5_K_M.gguf",
"id": "phind-34b",
"object": "model",
"name": "Phind 34B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "Phind, The Bloke",
"tags": ["Code"],
"size": "24320000000"
}
}
```
- **Prompt Template:**
```json=
### System Prompt
{system_message}
### User Message
{prompt}
### Assistant
```
4. [Wizard Coder Python 34B](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)
```json=
{
"source_url": "https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_K_M.gguf",
"id": "wizard-coder-34b",
"object": "model",
"name": "Wizard Coder Python 34B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "WizardLM, The Bloke",
"tags": ["Code"],
"size": "24320000000"
}
}
```
- **Prompt Template:**
```json=
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
```
5. [Dolphin yi 34B](https://huggingface.co/ehartford/dolphin-2_2-yi-34b)
```json=
{
"source_url": "https://huggingface.co/TheBloke/dolphin-2_2-yi-34b-GGUF/blob/main/dolphin-2_2-yi-34b.Q5_K_M.gguf",
"id": "dolphin-yi-34b",
"object": "model",
"name": "Dolphin Yi 34B",
"version": "1.0",
"description": "",
"format": "gguf",
"settings": {
"ctx_len": "4096",
"ngl": "100"
},
"parameters": {
"temperature": "0.7",
"max_tokens": "4096",
"stream": "true"
},
"metadata": {
"author": "ehartford, The Bloke",
"tags": ["General Use", "Role-playing"],
"size": "24320000000"
}
}
```
- **Prompt Template:**
```json=
<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
## For 64GB (~42 - 52GB RAM) and above
1. [Xwin LM 70B](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1)
- https://huggingface.co/TheBloke/Xwin-LM-70B-V0.1-GGUF
- For General Use
2. [Storytelling 70B v1](https://huggingface.co/GOAT-AI/GOAT-70B-Storytelling)
- https://huggingface.co/TheBloke/GOAT-70B-Storytelling-GGUF
- General Use and Role-playing
3. [Goliath 120B](https://huggingface.co/alpindale/goliath-120b?text=Hey+my+name+is+Thomas%21+How+are+you%3F) (~74 - 86GB RAM)
- https://huggingface.co/TheBloke/goliath-120b-GGUF?text=Hey+my+name+is+Clara%21+How+are+you%3F
- For Role-playing and General Use
4. [Yarn 32k 70B](https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k)
- https://huggingface.co/TheBloke/Yarn-Llama-2-70B-32k-GGUF
- For General Use and Big Context Length
5. [lzlv 70B](https://huggingface.co/lizpreciatior/lzlv_70b_fp16_hf)
- https://huggingface.co/TheBloke/lzlv_70B-GGUF
- For Role-play and General Use
# Quantization Explaination
- GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
- GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
- GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
- GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
- GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
-> Most of the model I choose `Q4_K_M` and `Q5_K_M`.
# Acknowledgement
1. [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
2. [Chatbot arena Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
3. [Reddit best 7b](https://www.reddit.com/r/LocalLLaMA/comments/181x7ya/what_is_the_best_7b_right_now/?share_id=sLPmYaHt79F-PF6ZoUPiP)
4. [Mistral comparation](https://www.reddit.com/r/LocalLLaMA/comments/178nf6i/mistral_llm_comparisontest_instruct_openorca/)
5. [Model megathread](https://www.reddit.com/r/LocalLLaMA/comments/185770m/models_megathread_2_what_models_are_you_currently/?share_id=cv8l61_l6xprPSATn0c6I)
6. [Setting for Role-playing](https://www.reddit.com/r/LocalLLaMA/comments/185ce1l/my_settings_for_optimal_7b_roleplay_some_general/)