Try   HackMD

DoC Environment Setup

SSH

You should create an ssh-key pair to allow password-free login from your device to the lab machines (and jump shell). I recommend following this guide.

On your local computer you'll want to setup the ~/.ssh/config as follows (with e.g. <USERNAME>=afs219)

# This is the Jump box which allows us to access doc machines
# from outside of Uni (without the VPN) 
Host uniJumpShell
    User <USERNAME>
    # Sometimes a shell may go down - can choose from 5
    HostName shell5.doc.ic.ac.uk  
    # ssh key location - You may change this
    IdentityFile ~/.ssh/id_rsa    

Host "edge*" "gpu*" "ray*"
    User <USERNAME>
    Port 22
    IdentityFile ~/.ssh/id_rsa
    ProxyCommand ssh -q -W %h:%p uniJumpShell

This allows you to directly ssh into any lab machine directly, without a password and without the VPN. E.g. ssh gpu07 or ssh ray22. Similarly, you should be able to directly connect with VSCode

VSCode

On some devices you may encounter timeouts when SSHing to lab machines from VSCode (via remote develop). This can be fixed by changing the settings.

  1. Open the command pallete (e.g. CTRL+SHIFT+P)
  2. Select Preferences: Open Settings (JSON) - NOT the default settings (these can't be changed)
  3. Append "remote.SSH.useLocalServer": false, - remember to add a , to the previous line.

Additionally, you should add the line

export VSCODE_DISABLE_PROC_READING=true

to your ~/.bashrc to prevent the VSCode Server from clogging up lab machines when you log off.


Lab Machines

Quota

Your quota is very small by default, so you will want to store non-essential files on the bitbucket (/vol/bitbucket/). See the department info site for more details.

If you haven't used the bitbucket before, you'll need to create a directory there under your username, e.g.

mkdir /vol/bitbucket/afs219

Python Environment

Create a Python virtual environment on the bitbucket, (here we call it "dl_cw_pyenv"):

python3 -m venv /vol/bitbucket/<USERNAME>/dl_cw_pyenv

If you want your bash environment to use this python environment then every time you login / create a new terminal window you need to do:

source /vol/bitbucket/<USERNAME>/dl_cw_pyenv/bin/activate

However, you can instead append this line to your ~/.bashrc file, which is automatically loaded whenever a new bash instance is created.

Using GPUs

In order for PyTorch (and other libraries) to detect the CUDA installation on lab machines, you will need to add these to the LD_LIBRARY_PATH. This means adding the following lines to your ~/.bashrc:

. /vol/cuda/11.4.120-cudnn8.2.4/setup.sh export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/cuda/TensorRT-6.0.1.5/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/cuda/11.4.120-cudnn8.2.4/targets/x86_64-linux/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/vol/cuda/11.4.120-cudnn8.2.4/x86_64-linux-gnu

You may wish to use a newer version of CUDA, and you can check what is available by listing /vol/cuda* directories. However, this version should suffice.

VSCode Python Environment

You may find that if you are trying to select a Python Kernel in a VSCode .ipynb, the environment we just created isn't listed. In this case:

  1. Create a temporary Python file in your VSCode workspace
  2. Select the interpreter at the bottom left. You should see the environment we just created. If not, then just navigate to the python binary manually (this should be in e.g. source /vol/bitbucket/<USERNAME>/dl_cw_pyenv/bin/python)
  3. Once you have selected your virtual environment as the interpreter, you should also see it listed as a kernel in the .ipynb file. (you will be asked to install Jupyter etc. the first time). You can delete that Python file now.

Finding Free Devices

GPU Machines

During my MSc I wrote a brittle script which can be used to check which GPU machines are being utilized - https://github.com/afspies/ssh_gpu_checker. I will endeavour to update this in the coming weeks.

General Machines

Non-"GPUXX" lab machines still have reasonable GPUs (comparable to the cheapest paid GPUs on paperspace). Nuri Cingillioglu wrote some python scripts for the department which allocate machines:

/vol/linux/bin/freelabmachine -> is a Python script that picks out a free lab machine based on Condor usage status. It contacts Condor, asks for all the lab machines and picks based on simple rules such as usage and load. It randomises the picks.

/vol/linux/bin/sshtolab -> wrapper over freelabmachine script that in the basic case just runs "ssh -Y freelabmachine".

/vol/bitbucket/nuric/bin/runjupyter -> this script runs jupyter notebooks. It activates the virtual environment and then runs "jupyter lab" with appropriate flags and port forwarding so the students can click and connect back on their laptops. The port forwarding is done twice, once from shell servers and then from the lab machine. It can take different virtual environments as arguments.

To use all of these scripts you should first create a python virtual environment as above, and then install the prerequistes with:

pip3 install --no-cache-dir --upgrade -r /vol/bitbucket/nuric/docenv_requirements.txt