# DPO scripts
We will use the curated repo from Huggingface for training model (SFT and DPO)
[Alignment Handbook](https://github.com/huggingface/alignment-handbook)
# How to run it
We should use it on Linux or WSL.
## Installation
1. Create venv (Must be python v3.10)
```
conda create -n handbook python=3.10 && conda activate handbook
```
2. Install Pytorch
```
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
3. Clone and install alignment handbook
```
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .
```
4. Install Flash Attention 2 for faster training
```
MAX_JOBS=4 pip install flash-attn --no-build-isolation
```
:::info
Note If using a machine has less than 96GB of RAM
```
python -m pip install flash-attn --no-build-isolation
```
:::
5. Set up Huggingface acount
```
huggingface-cli login --token <YOUR_TOKEN>
```
6. Git LFS
```
sudo apt-get install git-lfs
```
## Dataset preparation
Take a look at [this instruction](https://github.com/huggingface/alignment-handbook/blob/main/scripts/README.md#fine-tuning-on-your-datasets)
I created an example dataset here:
[Jan's subset](https://huggingface.co/datasets/jan-hq/ultrafeedback_binarized_subset)
## Training
The most important is YAML file
`recipes/{model_name}/{task}/config_lora.yaml`
We have 2 tasks: **SFT** and **DPO**
e.g.
`recipes/trinity-v1/dpo/config_lora.yaml`
The example for [config_lora.yaml](https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/dpo/config_lora.yaml)
The optimal one
```yaml
# Model arguments
model_name_or_path: jan-hq/trinity-v1
torch_dtype: auto
# LoRA arguments
use_peft: true
lora_r: 16 # best: 256
lora_alpha: 32 # best: 512
lora_dropout: 0.1
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
# Data training arguments
dataset_mixer:
jan-hq/ultrafeedback_binarized_subset: 1.0
dataset_splits:
- train_sft
- test_sft
preprocessing_num_workers: 12
# DPOTrainer arguments
bf16: true
beta: 0.1
do_eval: true
evaluation_strategy: epoch
eval_steps: 100
gradient_accumulation_steps: 32
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: trinity-v1-dpo
learning_rate: 5.0e-7
log_level: info
logging_steps: 10
lr_scheduler_type: linear
max_length: 1024 #increase to 2048 or 4096 is OK
max_prompt_length: 512
num_train_epochs: 1
optim: rmsprop
output_dir: data/trinity-v1-dpo
# It is handy to append `hub_model_revision` to keep track of your local experiments
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
push_to_hub: true
save_strategy: "no"
save_total_limit: null
seed: 42
warmup_ratio: 0.1
report_to:
- wandb
```
**IMPORTANT NOTE**
As the script will auto handle the chat format for us so we must check the `chat_template` inside `tokenizer.json` of a model.
E.g. for ChatML template for [trinity](https://huggingface.co/jan-hq/trinity-v1/blob/main/tokenizer_config.json)
```json
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}"
```
:::info
If we can setup multi-GPUs use the deepspeedv3 for the fastest training process.
```
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes={num_gpus} scripts/run_{task}.py recipes/{model_name}/{task}/config_lora.yaml
```
:::
## Evaluation
- LMM harness
```
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .
```
GSM8K
```
lm_eval --model hf \
--model_args pretrained=<REPO_ID>,dtype="bfloat16" \
--tasks gsm8k \
--device cuda:0 \
--num_fewshot 5 \
--batch_size 8
```
We can also try [`MT-Bench`](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) and [`AlpacaEval`](https://github.com/tatsu-lab/alpaca_eval)
2 tasks for Rex:
- [ ] add merge LORA to model
- [ ] add tokenizer.model
Merge LORA
```
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
adapter_path = "path/to/adapter"
base_model_path = "alignment-handbook/data/{base_model}"
merged_model= "alignment-handbook/data/{merged_path}"
print("Loading adapter...")
model = AutoModelForCausalLM.from_pretrained(
base_model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_path)
tokenizer = AutoTokenizer.from_pretrained(
base_model_path,
trust_remote_code=True,
)
model = model.merge_and_unload()
print("Saving merged model...")
print(model)
model.save_pretrained(merged_model)
tokenizer.save_pretrained(merged_model)
```