最後編輯日:2024.
硬體:
軟體:
cd desktop
git clone https://github.com/ggerganov/llama.cpp
llama.cpp
cd llama.cpp
LLAMA_METAL=1 make
python3 -m pip install -r requirements.txt
python3 -m pip install torch numpy sentencepiece
下載 git 套件:
winget install --id Git.Git -e --source winget
- 重啟終端機
打開新的終端機
去 Meta官網 填寫個人資料
從 Meta 寄送的郵件獲取 URL
切換至桌面
cd desktop
llama
git clone https://github.com/facebookresearch/llama
llama
cd llama
download.sh
./download.sh
依次輸入 URL(一段網址)與 欲下載的版本
Image Not Showing Possible ReasonsLearn More →
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Image Not Showing Possible ReasonsLearn More →
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
關閉終端機
將llama
新出現的檔案/資料夾移入desktop/llama.cpp/models
通常包含
tokenizer_checklist.chk
tokenizer.model
llama-2-7b
切到第一步開啟的終端機
運行convert.py
把模型轉換為 f16.gguf
,以減少記憶體使用和算力需求
python3 convert.py models/llama-2-7b
原貌:
python3 convert.py folder_path_to_model
差別在 q4_0 速度較快,但損耗較多; q8_0 速度較慢,但損耗較少,也較吃記憶體( M1 8G 會顯示
status 5
記憶體不足 )
./quantize ./models/llama-2-7b/ggml-model-f16.gguf ./models/llama-2-7b/ggml-model-q4_0.gguf q4_0
原貌:
./quantize model_path new_model_path q4_0
./main -m models/llama-2-7b/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
原貌:
./main -m path_to_model -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
cd desktop
cd llama.cpp
./main -m models/llama-2-7b/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
llama-7b-chat
orCodeLlama-7b
)僅需將llama-7b
全替換成llama-7b-chat
orCodeLlama-7b