# tcg-disc-wiki
This `.md` file contains some basic information about the `tcg-disc` workstation.
### Hardware description
* CPU: Intel i9-10980XE (18 cores in hyperthreading, 3.0GHz base, 4.6GHz turbo)
* GPU: Nvidia GeForce RTX 3090 (24GB, 10496 CUDA cores, 1.70 GHz boost)
* RAM: 256GB DDR4 3000MHz
* NVME: Samsung SSD 970 Evo Plus M.2 500GB
* HDD: IronWolf NAS 2 TB, 64MB, 5900rpm
* Operating system: Fedora 33 server
### Connecting to the workstation
In order to connect to the workstation you should open an `ssh` tunnel using the syntax:
```
ssh -f -N -L <PORT>:192.168.16.123:22 <USER>@147.162.63.10 -p 7000 -oKexAlgorithms=+diffie-hellman-group1-sha1
```
in which `<PORT>` represent a local port to be dedicated to the tunnel (choose one from 2000 to 2080) and `<USER>` must be an authorized username to access the gate server (if you get an error message, please look at the `Troubleshooting` section).
Once the `ssh` tunnel has been opened you should be able to log onto the machine using the command:
```
ssh -X tcg@localhost -p <PORT>
```
**WARNING:** Do not use the gate as an intermediate server when copying files. In order to transfer data you must use the `ssh` tunnel protocol.
### Troubleshooting
In some Linux distributions, the first attempt to open the ssh tunnel may fail with the following message:
```
Unable to negotiate with 147.162.63.10 port 7000: no matching host key type found. their offer: ssh-rsa,ssh-dss
```
to solve such an issue, open (or create) the file
```
~/.ssh/config
```
and then add to the file the following lines
```
Host 147.162.63.10
KexAlgorithms +diffie-hellman-group1-sha1
HostKeyAlgorithms +ssh-dss
PubkeyAcceptedKeyTypes +ssh-dss
```
### Quick guide to Environment Modules
In order to load software without exporting the required environment variables manually you can use [Environment modules](http://modules.sourceforge.net/). In order to list the available modules use the command:
```
module avail
```
In order to get a brief description (if implemented) of what the module does you can use:
```
module whatis <name_of_the_module>
```
more detailed information can be displayed (if implemented) using the command:
```
module help <name_of_the_module>
```
In order to load/unload a module you can use the commands:
```
module load <name_of_the_module>
module unload <name_of_the_module>
```
In order to check the modules currently loaded you can use the syntax:
```
module list
```
### Quick guide to Anaconda 3
If you need a `python3` virtual environment you can use the [anaconda 3](https://www.anaconda.com/products/individual) package. The package is available by default without the need of loading any module. In order to activate/deactivate a virtual environemt you can use the command:
```
conda activate <environment_name>
conda deactivate
```
In order to list the installed packages you can use the command:
```
conda list
```
In order to list the available environments you can use the command:
```
conda env list
```
Please refer to the environment [reference page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) to get more information about the environments management.
### Quick guide to GPU management
In order to monitor the GPU status you can use the `nvidia-smi` utility. If everything works correctly you should see something similar to the following output:
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:65:00.0 Off | N/A |
| 0% 54C P0 42W / 350W | 0MiB / 24265MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
If a job running on the machine is using the GPU acceleration it should appear in the`Processes` section.
##### Compiling GPU accelerated codes:
The current machine configuration has two independent cuda 11.2 releases that can be loaded calling the modules `cuda/cuda-11.2` and one of the nvidia [HPC SDK](https://developer.nvidia.com/hpc-sdk) modules (`nvhpc-byo-compiler/21.1`, `nvhpc-nompi/21.1` or `nvhpc/21.1.`). The `cuda/cuda-11.2` has been compiled together with the `lidcudnn` library and should be used for Machine Learning applications. The HPC SDK module loads the PGI compilers that can be used to compile [OpenACC](https://www.openacc.org/) codes.
In order to compile C/C++ cuda codes you can use the base syntax:
```
nvcc --gpu-architecture=sm_86 <your_code.cu> -o <compiled_code.exe>
```
In order to compile OpenACC accelerated C++ codes you can try the base syntax:
```
nvc++ -acc -gpu=cc80 -Minfo=all <your_code.cpp> -o <compiled_code.exe>
```
### Quick guide to Task Spooler
In order to organize multiple jobs the [Task Spooler](http://manpages.ubuntu.com/manpages/xenial/man1/tsp.1.html) job scheduler can be used. In order to verify the queue status you can simply call the `tsp` program. A job can be submitted using the syntax:
```
tsp <job>
```
If the job generates output on the terminal the output will be captured by the temporary file listed under the `output` column. If you wish to redirect the output you can use the syntax:
```
tsp sh -c "<job>"
```
for example:
```
tsp sh -c "echo $PATH >> MyPathEnvVar.txt"
```
In order to kill a running job you can use the syntax:
```
tsp -k <tsp_job_id>
```
If you wish to remove a queued job you can use the syntax:
```
tsp -r <tsp_job_id>
```
**WARNING: Keep in mind that task spooler has no control over the used resources.**
**WARNING: In task spooler there is no user protection, be careful when deleting jobs.**
### Quick guide to the mail delivery system
A mail delivery system has been configured on the machine using the [mutt](http://www.mutt.org/) service. If you wish to get an e-mail notification when your job has been completed you can add the following line to the job script:
```
echo | mutt -s "<mail-object>" <destination-email> -a <file_to_attach>
```
A simple example of input file with email notification follows:
```
#!/bin/bash
module load cuda/cuda-11.2
nvcc --gpu-architecture=sm_86 main.cu -o main.exe
./main.exe >> output.txt
echo | mutt -s "Hurray Job done" my_mail@something.com -a output.txt
module unload cuda/cuda-11.2
rm main.exe
```