# 2020-04-22 <br> HPC0: Introduction to Linux
Welcome to the hack pad for today's HPC0 course from Research Computing at the University of Leeds!
**Contact Research Computing** - https://bit.ly/arc-help
**Presentation for today** - https://bit.ly/hpc0linux
**Exercises for today** - https://drive.google.com/file/d/1dV8fMS_n6GOFZO_rmFfUBwnuBFGj6C58/view?usp=sharing
**Approximate schedule**
| Time | Agenda |
| -------- | ------------------------------- |
| 1300 | Introduction, connecting to ARC |
| 1350 | Break |
| 1400 | Navigating the shell |
| 1450 | Break and Exercise 1 |
| 1500 | Data transformation in the shell|
| 1545 | Wrap up and question |
## Further reading
Linux crib sheet that covers lots of idea from today and beyond - https://drive.google.com/file/d/0B4hIpRJzq8DPVG5xdEJWcGlRTkU/view?usp=sharing
## What's your name and where do you come from? - <br> Make a new line below and write a bit about yourself
Manir Ali - work in Leeds institute Molecular Medicine based at St James's. Aim to analyse next generation sequencing data.
Mohamed Hasan- Research fellow at school of computing.
Ummey Hany- PhD student from School of Medicine and Health. Aiming to analyse DNA sequencing data
Alex Coleman - Research Software Engineer in the Research Computing Team at Leeds. Likes: doing reproducible machine learning stuff with Python
Mustapha Bashir G - PhD student at SCAPE
Martin McPhillie - School of Chemistry. Virtual screening on ARC.
John Hodrien - Research Software Engineer in the Research Computing Team at Leeds. Likes: hardware, HPC, Linux, OpenGL, MPI
Andrew Bayly - Prof in Chem Eng - group does lots of modelling in CFD, DEM, MD etc. Re-learning Linux so I can use HPC from home.
Anna-Grace Linton - PhD student at LIDA. There is a good chance I will need to HPC in the future to handle large medical files. So I would like to get a better understanding of it.
Lena Kilian - PhD student at LIDA. I may need HPC to handle large expenditure and carbon footprint datasets (with spatial components/data).
Fran Pontin- PhD student from LIDA, aiming to use HPC to handle large volume of GPS data analysis.
Scott Wiseman - PhD student on the Bioenergy CDT. Looking to use HPC to run combustion chemistry models
Onur KADEM- Phd student dealing with State of Charge estimation for lithium batteries. I need HPC to solve my simulation.
Dylan Barker - PhD student starting in October. My advisor is Adam Sweetman so I may not be on the list as the booking is under his name.
Matthew Gaddes - Postdoc in Earth and Environment. Have previously trained a few deep learning models in Keras, but looking to do this somewhat more quickly!
Danah ALbuainain- PhD student.
Sarah Barr - PhD student in Earth and Environment (ICAS), using HPC to run FLEXPART particle dispersion model
Isabelle Pickles- PhD student in chemistry, biology and medicine. Hoping to use HPC to run docking simulations for drug design.
Jim McQuaid - Staff in Earth & Environment, using ARC4 to run FLEXPART dispersion model to track air masses arriving at field sites
Eulashini Chuntarpursat- School of Medicine
## Copies of some of the things Alex is doing in the terminal today, so you've got something to refer to.
### Getting started
Google for mobaxterm, Downloads, Home edition, Portable edition (works on all systems including when you don't have admin rights).
Once downloaded, go to your downloads, Extract All, and double click Mobaxterm_Personal_20_2.exe
(anytime a line starts with a $ symbol, I'm merely referring to the bash prompt, and you aren't expected to type this)
### To update mobaxterm settings
Go to Settings, and update the Persistent home directory to somewhere useful by clicking the folder icon and selecting a suitable directory. This is a location of your choice. Click OK when you're done, and this may restart Mobaxterm.
### Other ARC connection notes:
General connecting notes on connecting to ARC are here:
https://it.leeds.ac.uk/it?id=kb_article&sysparm_article=KB0013720
### Notes on a manual connection without using a config
You can just do this manually - connect to remote-access.leeds.ac.uk, and then once connected, connect to arc4.leeds.ac.uk
```
[username@laptop ~]$ ssh -Y username@remote-access.leeds.ac.uk
[username@euras01hv ~]$ ssh -Y username@arc4.leeds.ac.uk
```
### Suggested Mobaxterm config:
You can create a `config` file in your ~/.ssh directory to enable a 1-step login.
Within `config` file (replace USERNAME with your own username):
```
Host *.leeds.ac.uk !remote-access.leeds.ac.uk
ProxyJump USERNAME@remote-access.leeds.ac.uk
User USERNAME
```
Actually connect to a remote machine (it'll use the username you set in the config file by default, or you can specify another username, say for if you're using a training account)
```
$ ssh -Y arc4.leeds.ac.uk
$ ssh -Y exampleusername@arc4.leeds.ac.uk
```
If you get warnings about LANG/LC/LC_LANG then this is because of the way SSH works, and you need to work around this:
```
$ export LANG=en_GB.UTF-8
```
If you want to make this permanent, add that line to the end of your ~/.bashrc file.
### Simple commands
List files
```
$ ls
```
List files with more information
```
$ ls -l
```
Make Directory (this may be necessary):
```
$ mkdir .ssh
```
Change Directory (in this case into .ssh):
```
$ cd .ssh
```
Edit file (with whichever editor you want/have available - notepad is liable to try to create `config.txt` rather than `config`):
```
$ nano config
$ notepad config
$ vim config
```
Renaming a file - rename config.txt to config:
```
$ mv config.txt config
```
Clear the screen:
```
$ clear
```
Print working directory. This shows where we are on the filesystem, and prints the absolute path.
```
$ pwd
```
Moving files between directories, uses the exact same command as renaming files within a directory, and you can do the two together:
```
$ mv file somedirectory/
$ mv file somedirectory/newfilename
```
Deleting files. A simple deletion, followed by a recursive forced deletion of a directory and all of its contents:
```
$ rm somefile
$ rm -rf some-directory
```
Delete an empty directory:
```
$ rmdir some-directory
```
Output the contents of a text file:
```
$ cat somefile
$ less somefile
```
See the top/bottom of a file:
```
$ head somefile
$ head -n 5 somefile
$ tail somefile
$ tail -n 5 somefile
```
Sort a file (alphabetically or numerically, reverse order, or both):
```
$ sort somefile
$ sort -n somefile
$ sort -r somefile
$ sort -rn somefile
```
Pulling out just the second field from a csv file:
```
$ cut -d ',' -f 2 somefile
```
### General shell notes
Tab completion is your friend. Anytime the shell can, it helps you complete words when you press tab. If you press tab twice, it'll show you options for when it doesn't have a unique answer.
### Documentation on commands:
A number of different ways, with typically progressively more information:
```
$ ls --help
$ man ls
$ info ls
```
### Nano basics
Open a file: `nano somefile`
Save: Ctrl-O <press return>
Quit: Ctrl-X <press return>
### I'm stuck in a command how do I get out?
It sadly depends, but as a rule:
Escape
q
Ctrl-c
Ctrl-z
The last one is a touch ugly, as it doesn't actually kill the process but suspends it, then you would have to kill it afterwards.
## Questions/Issues/Anything to be addressed by Alex/John:
## Lesson notes - <br> Use the space below to make notes throughout the lesson
> Are you in arc3/4?
No, Alex is demonstrating this from his own laptop. We'll get onto ARC3/4 in a minute using Mobaxterm.
> If we already have an arc account shall we use it now?
I'd advise you not to, as it avoids the risk of deleting anything in your own account (when you potentially are unfamiliar with the commands you're using). If you have nothing in your HPC account yet, there's no harm in you doing so. You can't make a mistake in these test accounts, as they're throw away accounts.
> I'm on windows and had to make a .ssh file, does that matter?
If you see we covered that bit. You needed to make the .ssh directory if it didn't exist, and then you should have a file called config within there. Not a problem if the .ssh directory was missing.
> If I were SSH'd onto a university machine (eg FOE4), could I login to an arc machine without doing this?
Yes.
> So the config file allows you to miss out logging into remote access first
Exactly. It happens transparently to you. You may need to provide a password twice, but if you use ssh keys, you can remove that too, so it'll just log you straight in to the system you want to get to.
> I'm getting:
ssh: connect to host arc4.leeds.ac.uk port 22: Connection timed out
What should I do?
You've got network issues, or have been barred from remote-access for logging in incorrectly too many times. Your options are to wait or to use the VPN.
> To ssh to something like FOE04, I'd have to be using the Leeds VPN. Why is it (out of curiosity) that this isn't the case here?
Because we're using the magic of remote-access instead of using the VPN. There are many advantages of doing this, since the VPN is limited by the number of licenses. You'll note you can use this method to login to foe-linux systems in the same way.
> For reference on a mac you get a couple of security confirmations, see beloiw:
> Last login: Wed Apr 22 13:18:42 on ttys000
> CAPE-MAC-200634:~ preaeb$ cd ..
> CAPE-MAC-200634:Users preaeb$ ls
> Shared administrator preaeb
> CAPE-MAC-200634:Users preaeb$ cd preaeb
> CAPE-MAC-200634:~ preaeb$ ssh -Y issev019@arc4.leeds.ac.uk
> The authenticity of host 'remote-access.leeds.ac.uk (129.11.190.34)' can't be established.
> RSA key fingerprint is SHA256:SZN1IZ9rL0mhpnxhVG5uxbtVFMZAISg98X9ovHlh8Fg.
> Are you sure you want to continue connecting (yes/no)? y
> Please type 'yes' or 'no': yes
> Warning: Permanently added 'remote-access.leeds.ac.uk,129.11.190.34' (RSA) to the list of known hosts.
> issev019@remote-access.leeds.ac.uk's password:
> The authenticity of host 'arc4.leeds.ac.uk (<no hostip for proxy command>)' can't be established.
> ECDSA key fingerprint is SHA256:lPkw/7SrBqqQkS7lUm+tBN9JIGX9B8Gw7FdkK3MrpLM.
> Are you sure you want to continue connecting (yes/no)? yes
> Warning: Permanently added 'arc4.leeds.ac.uk' (ECDSA) to the list of known hosts.
This is all fine. The first time you connect to a host via SSH you are shown the fingerprint of the machine, which you can use to validate that your connection, preventing man-in-the-middle attacks. Once you've accepted it, it's checked every time you connect in future to confirm it hasn't been modified.
> what does nano mean?
nano is an editor, used for modifying text files.
>Can you make a new file in a particular directory without having to cd in to that directory?
Yes: `nano somedir/somefile`
> Hi, I have manage to change the prompt to [username@login2.arc4 /]$ where I do not have permission to make directories
> I cannot cd .. up any higher
Indeed, you've managed to move yourself all the way up to `/`. If you want to get back to your home directory you can just type `cd` or `cd ~`
That will put you back into your home directory. There's another magic option of `cd -` which takes you back to where you were previously.
> How about using: sort amphibians.txt birds.txt insects.txt mammals.txt reptiles.txt > all_animals.txt ?
Yes, you can explicitly lists stuff like this. The wildcard just does this for you, but the * gets expanded out before it ever runs the command, so it does run exactly the same thing
> why do you need the cat *.txt part? why not sort *txt first
Ah sorry, you don't. There's a lot of different ways of solving this, and normally I'd tell people off for excessive use of cat. You could equally do:
```
sort *.txt| sed -n 50p
```
You really often can avoid using cat, you're right.