What is the best model to use

# Model List ## For 4GB RAM (~2GB for 1B/ ~4GB for 3B) 1. [Tiny LLama Chat 1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6) ```json= { "source_url": "https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6/blob/main/ggml-model-q4_0.gguf", "id": "tinyllama-1.1b", "object": "model", "name": "TinyLlama Chat 1.1B", "version": "1", "description": "The TinyLlama project, featuring a 1.1B parameter Llama model, is pretrained on an expansive 3 trillion token dataset. Its design ensures easy integration with various Llama-based open-source projects. Despite its smaller size, it efficiently utilizes lower computational and memory resources, drawing on GPT-4's analytical prowess to enhance its conversational abilities and versatility.", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "TinyLlama", "tags": ["General Use"], "size": "637000000" }, } ``` - **Prompt Template:** ```markdown= <|system|> You are a friendly chatbot who always responds in the style of a pirate.</s> <|user|> How many helicopters can a human eat in one sitting?</s> <|assistant|> ... ``` 2. [Deepseek Coder 1.3B](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) ```json= { "source_url": "https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GGUF/blob/main/deepseek-coder-1.3b-base.Q4_K_M.gguf", "id": "deepseek-coder-1.3b", "object": "model", "name": "Deepseek Coder 1.3B", "version": "1", "description": "", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "deepseek, The Bloke", "tags": ["Code"], "size": "870000000" }, } ``` ## For 3B models they are not really suitable for 4GB RAM (even Q2) --- 3. [Rocket 3B](https://huggingface.co/pansophic/rocket-3B) ```json= { "source_url": "https://huggingface.co/TheBloke/rocket-3B-GGUF/resolve/main/rocket-3b.Q4_K_M.gguf?download=true", "id": "rocket-3b", "object": "model", "name": "Rocket 3B", "version": "1", "description": "", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "pansophic, The Bloke", "tags": ["General Use"], "size": "1710000000" }, } ``` - **Prompt Template:** ```json= <|im_start|>system System message here.<|im_end|> <|im_start|>user Your message here!<|im_end|> <|im_start|>assistant ... ``` 4. [Marx 3B](https://huggingface.co/acrastt/Marx-3B-V3) ```json= { "source_url": "https://huggingface.co/TheBloke/Marx-3B-v3-GGUF/blob/main/marx-3b-v3.Q4_K_M.gguf", "id": "marx-3b", "object": "model", "name": "Marx 3B", "version": "1", "description": "", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "acrastt, The Bloke", "tags": ["General Use"], "size": "1620000000" }, } ``` - **Prompt Template:** ```json= ### HUMAN: {prompt} ### RESPONSE: ... ``` 5. [IS LM 3B](https://huggingface.co/acrastt/IS-LM-3B) ```json= { "source_url": "https://huggingface.co/UmbrellaCorp/IS-LM-3B_GGUF/blob/main/IS-LM-Q4_K_M.gguf", "id": "islm-3b", "object": "model", "name": "IS LM 3B", "version": "1", "description": "", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "UmbrellaCorp, The Bloke", "tags": ["General Use", "Economics"], "size": "1710000000" }, } ``` - **Prompt Template:** ```json= USER: {prompt} ASSISTANT: ``` --- ## For 8GB RAM (~7GB RAM) 1. [Mistral OpenHermes 7B](https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF) ```json { "source_url": "https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/blob/main/openhermes-2.5-mistral-7b.Q4_K_M.gguf", "id": "openhermes-2.5-mistral-7b", "object": "model", "name": "Openhermes 2.5 Mistral 7B", "version": "2.5", "description": "The Teknium-developed OpenHermes 2.5 Mistral 7B incorporates additional code datasets, more than a million GPT-4 generated data examples, and other high-quality open datasets. This enhancement led to significant improvement in benchmarks, highlighting its improved skill in handling code-centric tasks.", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "Teknium, The Bloke", "tags": ["General", "Roleplay"], "size": "4370000000" } } ``` - **Prompt template: ChatML** ``` <|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` 2. [Neural Chat 7B](https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF) ```json { "source_url": "https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF/blob/main/neural-chat-7b-v3-1.Q4_K_M.gguf", "id": "neural-chat-7b-v3-1", "object": "model", "name": "Neural Chat 7B", "version": "3.1", "description": "The Neural Chat 7B model, developed on the foundation of mistralai/Mistral-7B-v0.1, has been fine-tuned using the Open-Orca/SlimOrca dataset and aligned with the Direct Preference Optimization (DPO) algorithm. It has demonstrated substantial improvements in various AI tasks and performance well on the open_llm_leaderboard.", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100", }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "Intel, The Bloke", "tags": ["General Use", "Role-playing"], "size": "4370000000" } } ``` - **Prompt Template** ```markdown ### System: {system} ### User: {usr} ### Assistant: ``` 3. [Open Chat 3.5 7B](https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF) ```json { "source_url": "https://huggingface.co/TheBloke/openchat_3.5-GGUF/blob/main/openchat_3.5.Q4_K_M.gguf", "id": "openchat_3.5", "object": "model", "name": "Open Chat 3.5 7B", "version": "1.0", "description": "OpenChat represents a breakthrough in the realm of open-source language models. By implementing the C-RLFT fine-tuning strategy, inspired by offline reinforcement learning, this 7B model achieves results on par with ChatGPT (March).", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "OpenChat, The Bloke", "tags": ["General", "Code"], "size": "4370000000" } } ``` - **Prompt Template: OpenChat** The GPT4 template is also available as the integrated tokenizer.chat_template ```json= messages = [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi"}, {"role": "user", "content": "How are you today?"} ] ``` ```json= GPT4 User: {prompt}<|end_of_turn|> GPT4 Assistant: ``` 4. [Zephyr Beta 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) ```json { "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf", "id": "zephyr-7b-beta", "object": "model", "name": "Zephyr Beta 7B", "version": "1.0", "description": "The Zephyr-7B-β model marks the second iteration in the Zephyr series, designed to function as an effective assistant. It has been fine-tuned from the mistralai/Mistral-7B-v0.1 base model, utilizing a combination of public and synthetic datasets with the application of Direct Preference Optimization.", "format": "gguf", "settings": { "ctx_len": "2048", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "2048", "stream": "true" }, "metadata": { "author": "HuggingFaceH4, The Bloke", "tags": ["General Use"], "size": "4370000000" } } ``` - **Prompt template: Zephyr** ```json= <|system|> </s> <|user|> {prompt}</s> <|assistant|> ``` 5. [Open Orca 7B](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) ```json { "source_url": "https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/blob/main/mistral-7b-openorca.Q4_K_M.gguf", "id": "openorca-7b", "object": "model", "name": "OpenOrca 7B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "8192", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "8192", "stream": "true" }, "metadata": { "author": "OpenOrca, The Bloke", "tags": ["General", "Code"], "size": "4370000000" } } ``` - **Prompt template: ChatML** ```json= <|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` 6. [Starling 7B alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) ```json= { "source_url": "https://huggingface.co/TheBloke/Starling-LM-7B-alpha-GGUF/blob/main/starling-lm-7b-alpha.Q4_K_M.gguf", "id": "strarling-7b", "object": "model", "name": "Strarling alpha 7B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "8192", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "8192", "stream": "true" }, "metadata": { "author": "Berkeley-nest, The Bloke", "tags": ["General", "Code"], "size": "4370000000" } } ``` ## For 16 GB 1. [Wizard Coder Python 13B](https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0) ```json= { "source_url": "https://huggingface.co/TheBloke/WizardCoder-Python-13B-V1.0-GGUF/blob/main/wizardcoder-python-13b-v1.0.Q5_K_M.gguf", "id": "wizard-coder-13b", "object": "model", "name": "Wizard Coder Python 13B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "WizardLM, The Bloke", "tags": ["Code"], "size": "9230000000" } } ``` - **Prompt Template:** ```json= Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {prompt} ### Response: ``` 2. [Tiefighter 13B](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) ```json= { "source_url": "https://huggingface.co/TheBloke/LLaMA2-13B-Tiefighter-GGUF/blob/main/llama2-13b-tiefighter.Q5_K_M.gguf", "id": "tiefighter-13b", "object": "model", "name": "Tiefighter 13B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "KoboldAI, The Bloke", "tags": ["General Use", "Role-playing"], "size": "9230000000" } } ``` - **Prompt template:** ```json= ``` 3. [MythoMax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b) ```json= { "source_url": "https://huggingface.co/TheBloke/LLaMA2-13B-Tiefighter-GGUF/blob/main/llama2-13b-tiefighter.Q5_K_M.gguf", "id": "tiefighter-13b", "object": "model", "name": "Tiefighter 13B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "Gryphe, The Bloke", "tags": ["Role-playing"], "size": "9230000000" } } ``` - **Prompt Template: Custom** ```json= <System prompt/Character Card> ### Instruction: Your instruction or question here. For roleplay purposes, I suggest the following - Write <CHAR NAME>'s next reply in a chat between <YOUR NAME> and <CHAR NAME>. Write a single reply only. ### Response: ``` 4. [Orca 2 13B](https://huggingface.co/microsoft/Orca-2-13b) ```json= { "source_url": "https://huggingface.co/TheBloke/Orca-2-13B-GGUF/blob/main/orca-2-13b.Q5_K_M.gguf", "id": "orca-13b", "object": "model", "name": "Orca 2 13B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "Gryphe, The Bloke", "tags": ["Role-playing"], "size": "9230000000" } } ``` - **Prompt Template: ChatML** ```json= <|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` 5. [Noromaid 20B](https://huggingface.co/NeverSleep/Noromaid-20b-v0.1.1) (~15GB RAM) ```json= { "source_url": "https://huggingface.co/TheBloke/Noromaid-20B-v0.1.1-GGUF/blob/main/noromaid-20b-v0.1.1.Q4_K_M.gguf", "id": "noromaid-20b", "object": "model", "name": "Noromaid 20B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "NeverSleep, The Bloke", "tags": ["Role-playing"], "size": "12040000000" } } ``` - **Prompt Template: Custom** ```json= Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {prompt} ### Response: ``` ## For 32 GB (~26GB RAM) 1. [Yi 34B](https://huggingface.co/01-ai/Yi-34B) ```json= { "source_url": "https://huggingface.co/TheBloke/Yi-34B-Chat-GGUF/blob/main/yi-34b-chat.Q5_K_M.gguf", "id": "yi-34b", "object": "model", "name": "Yi 34B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "8192", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "8192", "stream": "true" }, "metadata": { "author": "01-ai, The Bloke", "tags": ["General"], "size": "24320000000" } } ``` - **Prompt Template:** ```json= <|im_start|>system {system_message}<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` 2. [Capybara 200k 34B](https://huggingface.co/NousResearch/Nous-Capybara-34B) ```json= { "source_url": "https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF/blob/main/nous-capybara-34b.Q5_K_M.gguf", "id": "capybara-34b", "object": "model", "name": "Capybara 34B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "200000", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "200000", "stream": "true" }, "metadata": { "author": "NousResearch, The Bloke", "tags": ["General", "Big Context Length"], "size": "24320000000" } } ``` - **Prompt Template:** ```json= USER: {prompt} ASSISTANT: ``` 3. [Phind Codellama 34B](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) ```json= { "source_url": "https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF/blob/main/phind-codellama-34b-v2.Q5_K_M.gguf", "id": "phind-34b", "object": "model", "name": "Phind 34B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "Phind, The Bloke", "tags": ["Code"], "size": "24320000000" } } ``` - **Prompt Template:** ```json= ### System Prompt {system_message} ### User Message {prompt} ### Assistant ``` 4. [Wizard Coder Python 34B](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0) ```json= { "source_url": "https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_K_M.gguf", "id": "wizard-coder-34b", "object": "model", "name": "Wizard Coder Python 34B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "WizardLM, The Bloke", "tags": ["Code"], "size": "24320000000" } } ``` - **Prompt Template:** ```json= Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {prompt} ### Response: ``` 5. [Dolphin yi 34B](https://huggingface.co/ehartford/dolphin-2_2-yi-34b) ```json= { "source_url": "https://huggingface.co/TheBloke/dolphin-2_2-yi-34b-GGUF/blob/main/dolphin-2_2-yi-34b.Q5_K_M.gguf", "id": "dolphin-yi-34b", "object": "model", "name": "Dolphin Yi 34B", "version": "1.0", "description": "", "format": "gguf", "settings": { "ctx_len": "4096", "ngl": "100" }, "parameters": { "temperature": "0.7", "max_tokens": "4096", "stream": "true" }, "metadata": { "author": "ehartford, The Bloke", "tags": ["General Use", "Role-playing"], "size": "24320000000" } } ``` - **Prompt Template:** ```json= <|im_start|>system You are Dolphin, a helpful AI assistant.<|im_end|> <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant ``` ## For 64GB (~42 - 52GB RAM) and above 1. [Xwin LM 70B](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1) - https://huggingface.co/TheBloke/Xwin-LM-70B-V0.1-GGUF - For General Use 2. [Storytelling 70B v1](https://huggingface.co/GOAT-AI/GOAT-70B-Storytelling) - https://huggingface.co/TheBloke/GOAT-70B-Storytelling-GGUF - General Use and Role-playing 3. [Goliath 120B](https://huggingface.co/alpindale/goliath-120b?text=Hey+my+name+is+Thomas%21+How+are+you%3F) (~74 - 86GB RAM) - https://huggingface.co/TheBloke/goliath-120b-GGUF?text=Hey+my+name+is+Clara%21+How+are+you%3F - For Role-playing and General Use 4. [Yarn 32k 70B](https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k) - https://huggingface.co/TheBloke/Yarn-Llama-2-70B-32k-GGUF - For General Use and Big Context Length 5. [lzlv 70B](https://huggingface.co/lizpreciatior/lzlv_70b_fp16_hf) - https://huggingface.co/TheBloke/lzlv_70B-GGUF - For Role-play and General Use # Quantization Explaination - GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw) - GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw. - GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw. - GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw - GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw -> Most of the model I choose `Q4_K_M` and `Q5_K_M`. # Acknowledgement 1. [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) 2. [Chatbot arena Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) 3. [Reddit best 7b](https://www.reddit.com/r/LocalLLaMA/comments/181x7ya/what_is_the_best_7b_right_now/?share_id=sLPmYaHt79F-PF6ZoUPiP) 4. [Mistral comparation](https://www.reddit.com/r/LocalLLaMA/comments/178nf6i/mistral_llm_comparisontest_instruct_openorca/) 5. [Model megathread](https://www.reddit.com/r/LocalLLaMA/comments/185770m/models_megathread_2_what_models_are_you_currently/?share_id=cv8l61_l6xprPSATn0c6I) 6. [Setting for Role-playing](https://www.reddit.com/r/LocalLLaMA/comments/185ce1l/my_settings_for_optimal_7b_roleplay_some_general/)