訓練過程的核心
超參數(Hyperparameters)
這邊用的範例是pytorch的訓練流程
訓練流程概述
用Lamini的lib可以簡化到3行
是否又會重演keras很難debug又讓人無法了解細節的問題呢? 總之是加快普及應用,但要做客製化的處理可能還是需要底層點的工具
from llama import BasicModelRunner
model = BasicModelRunner("EleutherAI/pythia-410m")
model.load_data_from_jsonlines("lamini_docs.jsonl", input_key="question", output_key="answer")
model.train(is_public=True)
直接看使用模型跟推理的部分
Set up the model, training config, and tokenizer
model_name = "EleutherAI/pythia-70m"
training_config = {
"model": {
"pretrained_name": model_name,
"max_length" : 2048
},
"datasets": {
"use_hf": use_hf,
"path": dataset_path
},
"verbose": True
}
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
train_dataset, test_dataset = tokenize_and_split_data(training_config, tokenizer)
Load the base model
base_model = AutoModelForCausalLM.from_pretrained(model_name)
base_model.to(device)
GPTNeoXForCausalLM(
(gpt_neox): GPTNeoXModel(
(embed_in): Embedding(50304, 512)
(emb_dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-5): 6 x GPTNeoXLayer(
(input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
(post_attention_dropout): Dropout(p=0.0, inplace=False)
(post_mlp_dropout): Dropout(p=0.0, inplace=False)
(attention): GPTNeoXAttention(
(rotary_emb): GPTNeoXRotaryEmbedding()
(query_key_value): Linear(in_features=512, out_features=1536, bias=True)
(dense): Linear(in_features=512, out_features=512, bias=True)
(attention_dropout): Dropout(p=0.0, inplace=False)
)
(mlp): GPTNeoXMLP(
(dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)
(dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)
(act): GELUActivation()
)
)
)
(final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
(embed_out): Linear(in_features=512, out_features=50304, bias=False)
訓練的各種超參數設定 Setup training
TrainingArguments
內了,範例code有提供註解,以下摘錄比較重要的參數設定說明
大致現代模型的各種奇淫巧計(?)都被封裝在裡面了,以前可是手刻了好多功能,相信沒有訓練經驗的人也能很快上手
training_args = TrainingArguments(
# Learning rate
learning_rate=1.0e-5,
# Number of training epochs
num_train_epochs=1,
# Max steps to train for (each step is a batch of data)
# Overrides num_train_epochs, if not -1
max_steps=max_steps,
# Batch size for training
per_device_train_batch_size=1,
# Directory to save model checkpoints
output_dir=output_dir,
# Other arguments
overwrite_output_dir=False, # Overwrite the content of the output directory
disable_tqdm=False, # Disable progress bars
eval_steps=120, # Number of update steps between two evaluations
save_steps=120, # After # steps model is saved
warmup_steps=1, # Number of warmup steps for learning rate scheduler
per_device_eval_batch_size=1, # Batch size for evaluation
evaluation_strategy="steps",
logging_strategy="steps",
logging_steps=1,
optim="adafactor",
gradient_accumulation_steps = 4,
gradient_checkpointing=False,
# Parameters for early stopping
load_best_model_at_end=True,
save_total_limit=1,
metric_for_best_model="eval_loss",
greater_is_better=False
)
model_flops = (
base_model.floating_point_ops(
{
"input_ids": torch.zeros(
(1, training_config["model"]["max_length"])
)
}
)
* training_args.gradient_accumulation_steps
)
# print(base_model)
print("Memory footprint", base_model.get_memory_footprint() / 1e9, "GB")
# Memory footprint 0.30687256 GB
print("Flops", model_flops / 1e9, "GFLOPs")
# Flops 2195.667812352 GFLOPs
Trainer
內開啟訓練
trainer = Trainer(
model=base_model,
model_flops=model_flops,
total_steps=max_steps,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
training_output = trainer.train()
後面的lab範例還包括:
放上一些lab的範例比較不同模型的表現,看看簡單訓練過的小模型表現(trained model)