大型語言模型LLMs課程教學課程大綱 (五)

[高階] 如何於國網超級電腦台灣杉二號使用多顆GPU微調語言模型

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Where do I start?

聯絡窗口 Email us : 2303117@narlabs.org.tw 王小姐

課程說明

上課時間: 2024 3/12 星期一 9:00-10:30
上課方式: 視訊課程 (請參考信件說明)
基本能力: 皆可
環境需求: Windows或Mac或Linux作業系統環境皆可

課程內容:

本堂課為示範教學為主, 下課後若學員有興趣, 會保留時間讓學員實地操作, 課程說明如下

:A: 簡報檔案:

:B: 教學文件

A. 台灣杉二號 (TAIWANIA 2)介紹 [請自己閱讀]

帳號註冊
- 註冊網址
- 文件說明
計算資源
- 文件說明
儲存資源
- 文件說明
登入與檔案傳輸節點
- 文件說明
如何登入主機
- 文件說明
如何執行檔案傳輸
- 文件說明
GPU Queue
- 文件說明
Slurm提交工作
- 文件說明

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

B. 台灣杉二號操作方法介紹 [現場示範]

課程重點: 學習如何使用Singularity操作Docker Image
- 教學文件
- 範例影片
- [LLama-factory] https://github.com/c00cjz00/llama-factory-docker
課程重點: 學習如何派送 HPC Slurm job
- 教學文件
- 範例影片
課程重點: 如何啟動 Jupyter Notebook操作介面
- 教學文件
- 範例影片
課程重點: 如何建立 Singularity Image kernel in Jupyter Notebook
- 教學文件
- 範例影片

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

C. 如何於國網超級電腦台灣杉二號使用多顆GPU微調語言模型 [現場示範]

GITHUB: https://github.com/c00cjz00/deepspeed_code.git
DEEPSPEED 介紹
- ZeRO主要包括三個等級的最佳化：ZeRO-1，ZeRO-2和ZeRO-3，每個等級都在前一個等級的基礎上進行了進一步的最佳化。
- 教學文件

模型所需要的硬體資源

詳細程式碼說明文件
概略程式碼範例如下

from transformers import AutoModel
from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_live
model = AutoModel.from_pretrained("/work/u00cjz00/slurm_jobs/github/models/Llama-2-7b-chat-hf")
estimate_zero2_model_states_mem_needs_all_live(model, num_gpus_per_node=1, num_nodes=1

單顆GPU微調語言模型
- python
  - 詳細程式碼說明文件
  - 概略程式碼範例如下
```
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py
```

多顆GPU微調語言模型

deepspeed (指定GPU –include localhost:0,1)

詳細程式碼說明文件
概略程式碼範例如下
zero0 (DDP), zero1 (優化器狀態分片), zero2 (梯度優化分片), zero3 (參數優化分片) )

#參數設定
export GPUS_PER_NODE=2        #GPU數量
#執行工作 
deepspeed \
--num_gpus ${GPUS_PER_NODE} \
src/train_bash.py \
--deepspeed ds_config.json

跨節點微調語言模型 (多台電腦)

torchrun + deepspeed

詳細程式碼說明文件
概略程式碼範例如下
zero0 (DDP), zero1 (優化器狀態分片), zero2 (梯度優化分片), zero3 (參數優化分片) )

#參數設定
export GPUS_PER_NODE=2        #GPU數量
export MASTER_ADDR=gpn3002    #Master HOSTNAME 
export MASTER_PORT=9001       #Master IP

#執行工作 
##${SLURM_NNODES} ${SLURM_PROCID} 為系統自動給予參數
torchrun \
--nproc_per_node ${GPUS_PER_NODE} \
--master_addr ${MASTER_ADDR} \
--master_port ${MASTER_PORT} \
--nnodes ${SLURM_NNODES} \
--node_rank ${SLURM_PROCID} \
src/train_bash.py \
--deepspeed ds_config.json

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

D. 資料處理 [現場示範]

微調模型資料下載

詳細程式碼說明文件
概略程式碼範例如下

from datasets import load_dataset
dataset = load_dataset("michaelwzhu/ChatMed_Consult_Dataset", split="train", streaming=True,encoding='utf-8')

繁體簡體中文互轉

概略程式碼範例如下

import opencc #繁体简体互转 s2t or s2twp
op_cc=opencc.OpenCC('s2twp')
op_cc.convert("现在你是一名专业的中医医生，请用你的专业知识提供详尽而清晰的关于中医问题的回答。"),

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

大型語言模型LLMs課程教學 課程大綱 (五)

[高階] 如何於國網超級電腦台灣杉二號使用多顆GPU微調語言模型

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → Where do I start?

課程說明

課程內容:

:A: 簡報檔案:

:B: 教學文件

A. 台灣杉二號 (TAIWANIA 2)介紹 [請自己閱讀]

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → B. 台灣杉二號操作方法介紹 [現場示範]

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → C. 如何於國網超級電腦台灣杉二號使用多顆GPU微調語言模型 [現場示範]

Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More → D. 資料處理 [現場示範]

大型語言模型LLMs課程教學課程大綱 (五)

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

Where do I start?

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

B. 台灣杉二號操作方法介紹 [現場示範]

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

C. 如何於國網超級電腦台灣杉二號使用多顆GPU微調語言模型 [現場示範]

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

D. 資料處理 [現場示範]