DPO run - HackMD

# DPO scripts We will use the curated repo from Huggingface for training model (SFT and DPO) [Alignment Handbook](https://github.com/huggingface/alignment-handbook) # How to run it We should use it on Linux or WSL. ## Installation 1. Create venv (Must be python v3.10) ``` conda create -n handbook python=3.10 && conda activate handbook ``` 2. Install Pytorch ``` pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ``` 3. Clone and install alignment handbook ``` git clone https://github.com/huggingface/alignment-handbook.git cd ./alignment-handbook/ python -m pip install . ``` 4. Install Flash Attention 2 for faster training ``` MAX_JOBS=4 pip install flash-attn --no-build-isolation ``` :::info Note If using a machine has less than 96GB of RAM ``` python -m pip install flash-attn --no-build-isolation ``` ::: 5. Set up Huggingface acount ``` huggingface-cli login --token <YOUR_TOKEN> ``` 6. Git LFS ``` sudo apt-get install git-lfs ``` ## Dataset preparation Take a look at [this instruction](https://github.com/huggingface/alignment-handbook/blob/main/scripts/README.md#fine-tuning-on-your-datasets) I created an example dataset here: [Jan's subset](https://huggingface.co/datasets/jan-hq/ultrafeedback_binarized_subset) ## Training The most important is YAML file `recipes/{model_name}/{task}/config_lora.yaml` We have 2 tasks: **SFT** and **DPO** e.g. `recipes/trinity-v1/dpo/config_lora.yaml` The example for [config_lora.yaml](https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-beta/dpo/config_lora.yaml) The optimal one ```yaml # Model arguments model_name_or_path: jan-hq/trinity-v1 torch_dtype: auto # LoRA arguments use_peft: true lora_r: 16 # best: 256 lora_alpha: 32 # best: 512 lora_dropout: 0.1 lora_target_modules: - q_proj - k_proj - v_proj - o_proj - gate_proj - up_proj - down_proj # Data training arguments dataset_mixer: jan-hq/ultrafeedback_binarized_subset: 1.0 dataset_splits: - train_sft - test_sft preprocessing_num_workers: 12 # DPOTrainer arguments bf16: true beta: 0.1 do_eval: true evaluation_strategy: epoch eval_steps: 100 gradient_accumulation_steps: 32 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false hub_model_id: trinity-v1-dpo learning_rate: 5.0e-7 log_level: info logging_steps: 10 lr_scheduler_type: linear max_length: 1024 #increase to 2048 or 4096 is OK max_prompt_length: 512 num_train_epochs: 1 optim: rmsprop output_dir: data/trinity-v1-dpo # It is handy to append `hub_model_revision` to keep track of your local experiments per_device_train_batch_size: 2 per_device_eval_batch_size: 4 push_to_hub: true save_strategy: "no" save_total_limit: null seed: 42 warmup_ratio: 0.1 report_to: - wandb ``` **IMPORTANT NOTE** As the script will auto handle the chat format for us so we must check the `chat_template` inside `tokenizer.json` of a model. E.g. for ChatML template for [trinity](https://huggingface.co/jan-hq/trinity-v1/blob/main/tokenizer_config.json) ```json "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}" ``` :::info If we can setup multi-GPUs use the deepspeedv3 for the fastest training process. ``` ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes={num_gpus} scripts/run_{task}.py recipes/{model_name}/{task}/config_lora.yaml ``` ::: ## Evaluation - LMM harness ``` git clone https://github.com/EleutherAI/lm-evaluation-harness.git cd lm-evaluation-harness pip install -e . ``` GSM8K ``` lm_eval --model hf \ --model_args pretrained=<REPO_ID>,dtype="bfloat16" \ --tasks gsm8k \ --device cuda:0 \ --num_fewshot 5 \ --batch_size 8 ``` We can also try [`MT-Bench`](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) and [`AlpacaEval`](https://github.com/tatsu-lab/alpaca_eval) 2 tasks for Rex: - [ ] add merge LORA to model - [ ] add tokenizer.model Merge LORA ``` from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch adapter_path = "path/to/adapter" base_model_path = "alignment-handbook/data/{base_model}" merged_model= "alignment-handbook/data/{merged_path}" print("Loading adapter...") model = AutoModelForCausalLM.from_pretrained( base_model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True, ) model = PeftModel.from_pretrained(model, adapter_path) tokenizer = AutoTokenizer.from_pretrained( base_model_path, trust_remote_code=True, ) model = model.merge_and_unload() print("Saving merged model...") print(model) model.save_pretrained(merged_model) tokenizer.save_pretrained(merged_model) ```