# UW HYAK Notes ## SALLOC - Request for GPU ``` salloc --partition=gpu-l40 --account=stf --mem=10G --gres=gpu:1 --cpus-per-task=1 --time=2:00:00 ``` - Check if GPU is requested ``` scontrol show job 24333466 | grep gpu ``` - check the current status ``` squeue -u pingw220 -o "%.18i %12P %10a %10m %20j %9T %10M %R" ``` ## Conda Reinstall ``` rm -rf '/gscratch/scrubbed/andysu/miniconda3' bash Miniconda3-latest-Linux-x86_64.sh -p /gscratch/scrubbed/andysu/miniconda3 ``` ``` python -m pip install --force-reinstall --upgrade setuptools pip ``` ## 看哪個節點閒置 ``` sinfo -t idle salloc --partition=ckpt-all --gres=gpu:1 --nodelist=g3091 --time=8:00:00 ``` ## GPU 確認 code ``` module load cuda/11.8.0 python -c "import torch; print(torch.cuda.is_available())" ``` - python code ```python import torch print(torch.__version__) print(torch.version.cuda) # 確保 PyTorch 版本支援 CUDA print(torch.backends.cudnn.enabled) ``` ## 看有沒有GPU裝置 ``` scontrol show job 24202314 | grep TRES ``` ## 如果是用salloc,登出後要回原本computing node ``` srun --jobid=<jobid> --pty bash ``` ## flash_attn 嘗試2.6.1 (考慮cuda版本) -> 安裝成功 ## Conda Related Commands - Create conda environment ``` conda create --name my_env python=3.9 conda activate my_env ``` ## checkpoint ``` #!/bin/bash #SBATCH --job-name=dtw_eval #SBATCH --account=stf #SBATCH --partition=ckpt-g2 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --gres=gpu:l40s:1 #SBATCH --mem=1498G #SBATCH --time=24:00:00 #SBATCH --output=/gscratch/ark/pingw220/music-transcription-eval/evaluation/logs/dtw_eval_%j.out #SBATCH --error=/gscratch/ark/pingw220/music-transcription-eval/evaluation/logs/dtw_eval_%j.err #SBATCH --mail-type=END,FAIL #SBATCH --mail-user=pingw220@uw.edu ```