# DoC Environment Setup ## SSH You should create an ssh-key pair to allow password-free login from your device to the lab machines (and jump shell). I recommend following [this guide](https://www.doc.ic.ac.uk/~nuric/teaching/remote-working-for-imperial-computing-students.html). On your local computer you'll want to setup the ```~/.ssh/config``` as follows (with e.g. ```<USERNAME>=afs219```) ```bash # This is the Jump box which allows us to access doc machines # from outside of Uni (without the VPN) Host uniJumpShell User <USERNAME> # Sometimes a shell may go down - can choose from 5 HostName shell5.doc.ic.ac.uk # ssh key location - You may change this IdentityFile ~/.ssh/id_rsa Host "edge*" "gpu*" "ray*" User <USERNAME> Port 22 IdentityFile ~/.ssh/id_rsa ProxyCommand ssh -q -W %h:%p uniJumpShell ``` This allows you to directly ssh into any lab machine directly, without a password and without the VPN. E.g. ```ssh gpu07``` or ```ssh ray22```. Similarly, you should be able to directly connect with VSCode ### VSCode On some devices you may encounter timeouts when SSHing to lab machines from VSCode (via remote develop). This can be fixed by changing the settings. 1. Open the command pallete (e.g. ```CTRL+SHIFT+P```) 2. Select ```Preferences: Open Settings (JSON)``` - NOT the default settings (these can't be changed) 3. Append ```"remote.SSH.useLocalServer": false,``` - remember to add a ```,``` to the previous line. Additionally, you should add the line ```bash export VSCODE_DISABLE_PROC_READING=true ``` to your ```~/.bashrc``` to prevent the VSCode Server from clogging up lab machines when you log off. --------- ## Lab Machines ### Quota Your quota is very small by default, so you will want to store non-essential files on the bitbucket (```/vol/bitbucket/```). See [the department info](https://www.imperial.ac.uk/computing/csg/guides/file-storage/quota/) site for more details. If you haven't used the bitbucket before, you'll need to create a directory there under your username, e.g. ```bash mkdir /vol/bitbucket/afs219 ``` ### Python Environment Create a Python virtual environment on the bitbucket, (here we call it "dl_cw_pyenv"): ```bash python3 -m venv /vol/bitbucket/<USERNAME>/dl_cw_pyenv ``` If you want your bash environment to use this python environment then every time you login / create a new terminal window you need to do: ```bash source /vol/bitbucket/<USERNAME>/dl_cw_pyenv/bin/activate ``` However, you can instead **append this line to your ```~/.bashrc``` file**, which is automatically loaded whenever a new bash instance is created. ### Using GPUs In order for PyTorch (and other libraries) to detect the CUDA installation on lab machines, you will need to add these to the ```LD_LIBRARY_PATH```. This means adding the following lines to your ```~/.bashrc```: ```bash= . /vol/cuda/11.4.120-cudnn8.2.4/setup.sh export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/cuda/TensorRT-6.0.1.5/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/cuda/11.4.120-cudnn8.2.4/targets/x86_64-linux/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/cuda/11.4.120-cudnn8.2.4/x86_64-linux-gnu ``` You may wish to use a newer version of CUDA, and you can check what is available by listing ```/vol/cuda*``` directories. However, this version should suffice. ### VSCode Python Environment You may find that if you are trying to select a Python Kernel in a VSCode .ipynb, the environment we just created isn't listed. In this case: 1. Create a temporary Python file in your VSCode workspace 2. Select the interpreter at the bottom left. You __should__ see the environment we just created. If not, then just navigate to the python binary manually (this should be in e.g. ```source /vol/bitbucket/<USERNAME>/dl_cw_pyenv/bin/python```) 3. Once you have selected your virtual environment as the interpreter, you should also see it listed as a kernel in the .ipynb file. (you will be asked to install Jupyter etc. the first time). You can delete that Python file now. ## Finding Free Devices ### GPU Machines During my MSc I wrote a brittle script which can be used to check which GPU machines are being utilized - https://github.com/afspies/ssh_gpu_checker. I will endeavour to update this in the coming weeks. ### General Machines Non-"GPUXX" lab machines still have reasonable GPUs (comparable to the cheapest paid GPUs on paperspace). [Nuri Cingillioglu](https://www.doc.ic.ac.uk/~nuric/) wrote some python scripts for the department which allocate machines: ```/vol/linux/bin/freelabmachine``` -> is a Python script that picks out a free lab machine based on Condor usage status. It contacts Condor, asks for all the lab machines and picks based on simple rules such as usage and load. It randomises the picks. ```/vol/linux/bin/sshtolab``` -> wrapper over freelabmachine script that in the basic case just runs "ssh -Y `freelabmachine`". ```/vol/bitbucket/nuric/bin/runjupyter``` -> this script runs jupyter notebooks. It activates the virtual environment and then runs "jupyter lab" with appropriate flags and port forwarding so the students can click and connect back on their laptops. The port forwarding is done twice, once from shell servers and then from the lab machine. It can take different virtual environments as arguments. To use all of these scripts you should first create a python virtual environment as above, and then install the prerequistes with: ```bash pip3 install --no-cache-dir --upgrade -r /vol/bitbucket/nuric/docenv_requirements.txt ```