其他重要參考資源(蒐集中)

State of Open Source AI Book - 2023 Edition

Finetuning Large Language Models

訓練過程(Training process)

課程概要

提供完整的訓練流程demo

LLM的訓練過程與其他神經網絡相似，涉及添加訓練數據、計算損失和更新權重
使用PyTorch和HuggingFace可以進行深入的模型訓練，且有專用代碼示範
Lamini Llama庫提供簡化的模型訓練方法，只需三行代碼即可完成
訓練後的模型可以在本地保存並加載，並進行微調以提高效

訓練：與其他神經網絡相同(Training: same as other neural networks)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

訓練過程的核心
- 添加訓練數據：首先，你需要提供一定量的訓練數據，這些數據將用於訓練模型，使其能夠更好地執行特定任務
- 計算損失：模型在初次預測時可能會有所偏差，因此需要計算模型的預測與實際結果之間的差異，這稱為損失
- 透過模型進行反向傳播(Backprop through model)：當模型計算出損失後，它將使用反向傳播算法來找出哪些權重和偏差導致了高損失
- 更新權重：根據計算出的損失，模型的權重將進行調整，以改善其預測能力
超參數(Hyperparameters)
- 學習率(Learning rate)
  - 這是一個控制模型學習速度的參數。太高的學習率可能會導致模型在訓練過程中忽略某些細節，而太低的學習率則可能會使訓練過程過於緩慢
- 學習率計劃器(Learning rate scheduler)
  - 這是一種動態調整學習率的方法，通常會在訓練過程中根據某些條件減少學習率
- 優化器超參數(Optimizer hyperparameters)
  - 這些參數控制優化器的行為，優化器是用於更新模型權重的工具。不同的優化器和其超參數可以影響模型的訓練速度和效果

訓練流程的程式碼示意(Run through general chunks of training process in code)

這邊用的範例是pytorch的訓練流程

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

訓練流程概述
- 設定訓練循環(Epochs)：一個epoch代表模型已經過整個訓練數據集一次。根據需要，你可能會多次遍歷整個數據集
- 數據分批(Batching)：數據被分成多個小批次，每個批次包含一定數量的數據。這些批次在訓練過程中逐一送入模型
- 模型輸出：將每個批次的數據放入模型以獲得輸出
- 計算損失：根據模型的輸出和真實的答案計算損失
- 反向傳播(Backpropagation)：這是一個重要的步驟，用於計算每個參數的梯度
- 更新優化器：使用先前計算的梯度更新模型的參數

用Lamini的lib可以簡化到3行
是否又會重演keras很難debug又讓人無法了解細節的問題呢? 總之是加快普及應用，但要做客製化的處理可能還是需要底層點的工具





from llama import BasicModelRunner

model = BasicModelRunner("EleutherAI/pythia-410m") 
model.load_data_from_jsonlines("lamini_docs.jsonl", input_key="question", output_key="answer")
model.train(is_public=True)

lab 05_Training_lab_student

直接看使用模型跟推理的部分

Set up the model, training config, and tokenizer

















model_name = "EleutherAI/pythia-70m"
training_config = {
    "model": {
        "pretrained_name": model_name,
        "max_length" : 2048
    },
    "datasets": {
        "use_hf": use_hf,
        "path": dataset_path
    },
    "verbose": True
}

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
train_dataset, test_dataset = tokenize_and_split_data(training_config, tokenizer)

Load the base model


base_model = AutoModelForCausalLM.from_pretrained(model_name)
base_model.to(device)

檢視模型內部

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 512)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-5): 6 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXAttention(
          (rotary_emb): GPTNeoXRotaryEmbedding()
          (query_key_value): Linear(in_features=512, out_features=1536, bias=True)
          (dense): Linear(in_features=512, out_features=512, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)
          (dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)
          (act): GELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (embed_out): Linear(in_features=512, out_features=50304, bias=False)

訓練的各種超參數設定 Setup training

各種細節都包在class TrainingArguments內了，範例code有提供註解，以下摘錄比較重要的參數設定說明
- max_steps=max_steps
  - 最大訓練步驟數：每一步對應於一批數據的訓練。如果此值不為-1，它將覆蓋num_train_epochs，意味著訓練將在達到這個步驟數時停止
- gradient_accumulation_steps=4
  - 梯度累積步驟數：在這麼多的步驟中，梯度將被累積，然後一次性更新模型。這允許使用更大的有效批次大小，而不增加記憶體使用量
- metric_for_best_model="eval_loss"
  - 用於選擇最佳模型的指標：在這種情況下，它是評估損失
  - greater_is_better=False
    - 指標的方向：對於損失，較低的值是更好的，所以設為False

大致現代模型的各種奇淫巧計(?)都被封裝在裡面了，以前可是手刻了好多功能，相信沒有訓練經驗的人也能很快上手







































training_args = TrainingArguments(

  # Learning rate
  learning_rate=1.0e-5,

  # Number of training epochs
  num_train_epochs=1,

  # Max steps to train for (each step is a batch of data)
  # Overrides num_train_epochs, if not -1
  max_steps=max_steps,

  # Batch size for training
  per_device_train_batch_size=1,

  # Directory to save model checkpoints
  output_dir=output_dir,

  # Other arguments
  overwrite_output_dir=False, # Overwrite the content of the output directory
  disable_tqdm=False, # Disable progress bars
  eval_steps=120, # Number of update steps between two evaluations
  save_steps=120, # After # steps model is saved
  warmup_steps=1, # Number of warmup steps for learning rate scheduler
  per_device_eval_batch_size=1, # Batch size for evaluation
  evaluation_strategy="steps",
  logging_strategy="steps",
  logging_steps=1,
  optim="adafactor",
  gradient_accumulation_steps = 4,
  gradient_checkpointing=False,

  # Parameters for early stopping
  load_best_model_at_end=True,
  save_total_limit=1,
  metric_for_best_model="eval_loss",
  greater_is_better=False
)

檢視模型占用的記憶體及運算量
- 浮點運算次數（FLOPs）是一個常用於評估模型計算複雜性的指標
- 計算模型在給定的輸入長度和梯度累積步驟下的總浮點運算次數
















model_flops = (
base_model.floating_point_ops(
{
   "input_ids": torch.zeros(
       (1, training_config["model"]["max_length"])
  )
}
)
* training_args.gradient_accumulation_steps
)

# print(base_model)
print("Memory footprint", base_model.get_memory_footprint() / 1e9, "GB")
# Memory footprint 0.30687256 GB
print("Flops", model_flops / 1e9, "GFLOPs")
# Flops 2195.667812352 GFLOPs

各種材料都丟進Trainer內開啟訓練
- 包含模型、資料(訓練與驗證資料)、超參數設定等
- 比較特別的是模型運量也要丟入(model_flops)










trainer = Trainer(
    model=base_model,
    model_flops=model_flops,
    total_steps=max_steps,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

training_output = trainer.train()

後面的lab範例還包括:

運行更大的訓練模型並探索調節(Run much larger trained model and explore moderation)
- 使用更大的模型進行inference，並探索如何使用調節來改善輸出
使用小模型探索調節(Explore moderation using small model)
- 使用小型模型進行推斷，並探索如何使用調節來改善輸出。
使用Lamini的3行代碼微調模型(Finetune a model in 3 lines of code using Lamini)
- 顯示如何使用Lamini的llama庫以極少的代碼進行模型微調

放上一些lab的範例比較不同模型的表現，看看簡單訓練過的小模型表現(trained model)

Deeplearning.ai GenAI/LLM系列課程筆記

Large Language Models with Semantic Search。大型語言模型與語義搜索

Finetuning Large Language Models。微調大型語言模型

其他重要參考資源(蒐集中)

State of Open Source AI Book - 2023 Edition

Finetuning Large Language Models

訓練過程(Training process)

課程概要

訓練：與其他神經網絡相同(Training: same as other neural networks)

訓練流程的程式碼示意(Run through general chunks of training process in code)

lab 05_Training_lab_student

Deeplearning.ai GenAI/LLM系列課程筆記

Large Language Models with Semantic Search。大型語言模型與語義搜索

Finetuning Large Language Models。微調大型語言模型

其他重要參考資源(蒐集中)

State of Open Source AI Book - 2023 Edition

Finetuning Large Language Models

訓練過程(Training process)

課程概要

訓練：與其他神經網絡相同(Training: same as other neural networks)

訓練流程的程式碼示意(Run through general chunks of training process in code)

lab 05_Training_lab_student

Read more

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Baseline Email Assistant

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Introduction to Agent Memory

[AI Agents in LangGraph](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction)

AI / ML領域相關學習筆記入口頁面