Try   HackMD

Meta Llama 2 7B 架設 ( Windows )

最後編輯日:2024.
硬體:
軟體:

第一步 下載 llama.cpp

  1. 打開新的終端機
  2. 切換至桌面
  • cd desktop
  1. 下載 llama.cpp
  • git clone https://github.com/ggerganov/llama.cpp
  1. 切換到llama.cpp
  • cd llama.cpp
  1. 建立環境
  • LLAMA_METAL=1 make
  1. 安裝附加套件
  • python3 -m pip install -r requirements.txt
  • python3 -m pip install torch numpy sentencepiece

下載 git 套件:

  • winget install --id Git.Git -e --source winget
  • 重啟終端機

第二步 下載 Llama 本體

  1. 打開新的終端機

  2. Meta官網 填寫個人資料

  3. 從 Meta 寄送的郵件獲取 URL

  4. 切換至桌面

  • cd desktop
  1. 下載llama
  • git clone https://github.com/facebookresearch/llama
  1. 切換至llama
  • cd llama
  1. 運行download.sh
  • ./download.sh

依次輸入 URL(一段網址)與 欲下載的版本

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. 關閉終端機

  2. llama新出現的檔案/資料夾移入desktop/llama.cpp/models

    通常包含tokenizer_checklist.chk tokenizer.model llama-2-7b

第三步 量化模型

  1. 切到第一步開啟的終端機

  2. 運行convert.py把模型轉換為 f16.gguf,以減少記憶體使用和算力需求

  • python3 convert.py models/llama-2-7b

    原貌:python3 convert.py folder_path_to_model

  1. 量化成 q4_0 or q8_0

    差別在 q4_0 速度較快,但損耗較多; q8_0 速度較慢,但損耗較少,也較吃記憶體( M1 8G 會顯示 status 5 記憶體不足 )

  • ./quantize ./models/llama-2-7b/ggml-model-f16.gguf ./models/llama-2-7b/ggml-model-q4_0.gguf q4_0

    原貌:./quantize model_path new_model_path q4_0

  1. 運行模型
  • ./main -m models/llama-2-7b/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

原貌:./main -m path_to_model -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

  1. 大功告成
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

備註

  1. 再次開啟步驟:
    1. cd desktop
    2. cd llama.cpp
    3. ./main -m models/llama-2-7b/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
  2. 其他模組安裝(例如:llama-7b-chatorCodeLlama-7b )僅需將llama-7b全替換成llama-7b-chatorCodeLlama-7b

延伸