Try   HackMD

Using the DGX Station A100

Last updated: April 7, 2022

Connecting to VJTI's VPN

VJTI uses an OpenVPN server for providing access to VJTI's private network over the public internet. If you want to access any services from VJTI's network, like the DGX station or the library system.

To be able to authenticate to the VPN server, you need to get credentials from the system administrator. Contact a professor for details.

Once you have VPN credentials, you can connect to the VPN using a VPN client. 172.18.33.4 is the IP address of the DGX machine in CE&IT Department.

Linux

  1. Install the OpenVPN client. This can be usually be done using your package manager. For example:
sudo yum install openvpn	#CentOS 8/7/6
sudo apt install openvpn	#Ubuntu/Debian
sudo dnf install openvpn	#Fedora 22+/CentOS 8

You can also install the graphical interface. However this document will be not be using this.

sudo yum install network-manager-openvpn	#CentOS 8/7/6
sudo apt install network-manager-openvpn	#Ubuntu/Debian
sudo dnf install network-manager-openvpn	#Fedora 22+/CentOS 8
  1. Go to https://portal.vjti.ac.in:4430. You might get a warning like “Your connection is not private” or “Connection not secure” (sigh). Go ahead anyway. A Sophos portal will greet you.
  2. Enter the VPN username and password credentials given to you by the sysad. Once logged in, you’ll see something like this:
  3. Download configuration for "other OSs". This will download a .ovpn file, which is an OpenVPN client config file.
  4. Connect to the VPN by running the OpenVPN client with this config file:
sudo openvpn --config <file.ovpn>

You will be asked to enter your VPN credentials. If after a while you see some ip route add logs and a log with “Initialization Sequence Completed”, then you’re connected to the VPN.

Windows

  1. Follow step 3 as mentioned for Linux, and download the client and configuration for Windows.
  2. Install the Sophos client, and then you can use the system tray icon to (dis)connect to the VPN.

SSH

Make sure that you're connected to the VPN. Now, to SSH into the DGX station, use:

ssh <username>@172.18.33.4

This will prompt you for the password for <username>, and you're in!

Note that this username password pair is for your UNIX account on DGX, not the VPN; they're different.

Running Docker containers

  1. Find an image that you want to run. For example, say you're working on a PyTorch project, so you need a PyTorch environment on the DGX that's optimized for the DGX and GPUs. You look up the NGC catalog, and you find a PyTorch image, whose name and tag is nvcr.io/nvidia/pytorch:22.03-py3. To start a docker container based on this image:
docker run -it \    # run interactive shell,
--name torch-container \   # name of container
--dns 8.8.8.8 --net=host  \  # allow internet access, allow host network configuration
--gpus all \  # use all gpus
-v $HOME:$HOME \ # bind mount
nvcr.io/nvidia/pytorch:22.03-py3 # image name and tag

This will start up a container with Python3 and PyTorch all setup. Go crazy.

Notes:

  1. If you exit from the shell of the container, then you'll stop the container. If you don't want to stop the container, use Ctrl+p, Ctrl+q shortcut to escape. The container will keep running.
  2. Although the above command specifies --gpus all, make sure that you don't use a GPU that someone else is using. To check this, use nvidia-smi to see which GPUs are being used and which processes are using them.
  3. Your DGX user will be a part of the docker group, and so you will be able to access all docker containers, even those that someone else has created. Always remember, with great power, comes great responsibility.

GUI?

Here are some options:

  1. Don't use GUI, go hard-core terminal and use vim!
  2. Use jupyter. Most NGC images have jupyter installed. Use jupyter-lab instead of notebook to be able to edit files and use a terminal inside of lab itself
  3. Use VSCode's Remote Extension pack. First connect to the dgx host using Remote - SSH, and then attach to the container using Remote - Containers.

Monitoring and WTH is going on?

Use commands like ps, top, nvidia-smi, grep, less and their various flags to make sense of what's happening. Happy Linux-ing!

Bonus Stuff

Don't want to type in your VPN credentials every time you connect?

Warning: This method will leave the credentials in plaintext on the filesystem. Use this only if you’re sure that that’s secure.

  1. In the directory with the .ovpn file, make a plaintext file dgx-vpn-credentials.txt with username and password on two different lines, like:
username
password
  1. Now in the .ovpn file, find the line with auth-user-pass. Append that line with the name of the credentials file. For example:
auth-user-pass dgx-vpn-credentials.txt
  1. Save and you're done. Next time just run the .ovpn file. You’ll still have to enter your password for sudo though.

Pro-tip: make a bash function or alias to really speed up the process of connecting to the VPN.

Setting up SSH keys for login

  1. Generate a key-pair with ssh-keygen. For not entering a password every time, don't enter a passphrase, leave it empty.
  2. For the keyname, name it something, e.g. /home/user/.ssh/dgx. Do change the name or you might end up replacing an existing key-pair.
  3. Use ssh-copy-id to copy the identity file onto SSH. Make sure that you're connected to the VPN before running this command.
ssh-copy-id -i ~/.ssh/dgx <username>@172.18.33.4
  1. You'll be asked for the user's password, enter it.
  2. Done. Next time just use:
ssh -i ~/.ssh/dgx <username>@172.18.33.4

Authors

  • Akshat Shah, BTech IT
  • Shubhankar Gupta, BTech IT