NEOM SLURM falcontune

For the general docs, check out the [crafted document](https://hackmd.io/@okdimok/SkMl28kwn). For your specific case I would craft the cmd and try commenting the process. First, I assume the zip is similar to the github repo https://github.com/rmihaylov/falcontune. That repo requires Pytorch, so we can use the latest NGC Pytorch container. See [this part of the doc](https://hackmd.io/_gNV67kuRoGE9XCNXh0dmw#Pyxis--ENROOT-%E2%80%94-import-the-container-to-speed-up-the-start) on how to import it. My recommendation will be to prepare the specific image for falcon. Check out the [how to add new packages to image](https://hackmd.io/_gNV67kuRoGE9XCNXh0dmw#ENROOT-%E2%80%94-how-to-add-new-packages-to-image). You can use similar procedure to install the python requirements directly to the image and export it. In your case you will need to replace ```bash! apt install rolldice ``` with ```bash! pip install -r requirements.txt ``` Note you may need --mount option of `enroot start` [docs](https://github.com/NVIDIA/enroot/blob/master/doc/cmd/start.md) After having this installed, you just need to launch the SLURM job within the container. ```bash! srun -N 1 `#one node` \ --gres=gpu:8 `#8 GPUs per node` \ --container-image ~/nvidia+pytorch+23.07-py3.sqsh `# <- change to the one you get after installation of additonal packages` \ --container-mount-home \ bash -c "falcontune finetune \ --model=falcon-40b-instruct \ --weights=/home/yma/falcon-40b-instructt \ --dataset=/home/yma/qa_pairs.json \ --data_type=alpaca \ --lora_out_dir=/home/yma/models_weight/falcon-40b-instruct-alpaca \ --mbatch_size=1 \ --batch_size=2 \ --epochs=300 \ --lr=3e-4 \ --cutoff_len=512 \ --lora_r=16 \ --lora_alpha=16 \ --lora_dropout=0.05 \ --warmup_steps=5 \ --save_steps=50 \ --save_total_limit=3 \ --logging_steps=5 \ --target_modules='[\"query_key_value\"]' " ```