# 2023-05-09-NCL Unix Shell ### This document: # https://hackmd.io/@rseteam-ncl/2023-05-09-NCL ### JupyterHub: # https://jupyter.ncldata.dev **We asked for your uni login when you registered.Please use that to login - it is <span style="color:red;">case sensitive</span>. You don't have a password yet. The password you enter the first time you log in will become your password.** ### Introduction ### - Instructors - Helpers - What is The Carpentries - What is the RSE Team - Coffee breaks and lunch - Morning break: 11:00 - Lunch: 13:00 - Afternoon break: 15:30 ### Links: - [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html) - [Workshop website](https://nclrse-training.github.io/2023-05-09-NCL/) - [Link to lesson website](https://swcarpentry.github.io/shell-novice/) - [Link to data](https://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip) - [Pre-workshop survey](https://carpentries.typeform.com/to/wi32rS?slug=2023-05-09-NCL) - [Post-workshop survey](https://carpentries.typeform.com/to/UgVdRQ?slug=2023-05-09-NCL) ### Attendance: ## Please sign in using your <span style="color:red">university email</span> and your <span style="color:red">name</span>: 1. jannetta.steyn@newcastle.ac.uk, Jannetta Steyn 2. richard.noble@newcastle.ac.uk, Richard Noble 3. s.lane4@newcastle.ac.uk, Stephen Lane 4. S.Saleem3@newcastle.ac.uk 5. k.p.lendoye-l'eyebe2@newcastle.ac.uk , Knectt Lendoye 6. l.a.bruce@newcastle.ac.uk, Lawrence Bruce 7. a.bell16@newcastle.ac.uk, Alexander Bell 8. M.Deivarajan-Suresh2@newcastle.ac.uk, Mukilan Suresh 9. bethany.little@ncl.ac.uk, Beth Little 10. b9036125@newcastle.ac.uk, Matthaios Charidis 11. c0034294@newcastle.ac.uk, Xudong Mao ### Notes: ## Episode 4 Exercise 5 Pipe Reading Comprehension A file called `animals.csv` (in the `shell-lesson-data/exercise-data/animal-counts` folder) contains the following data: ``` 2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-06,fox,4 2012-11-07,rabbit,16 2012-11-07,bear,1 ``` What text passes through each of the pipes and the final redirect in the pipeline below? Note: The sort -r command sorts in reverse order. `$ cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt` Hint: Build the pipeline up one command at a time to test your understanding. ## Episode 4 Exercise 6 Pipe Construction For the file `animals.csv` from the previous exercise, consider the following command: BASH `$ cut -d , -f 2 animals.csv` The `cut` command is used to remove or ‘cut out’ certain sections of each line in the file, and cut expects the lines to be separated into columns by a Tab character. A character used in this way is a called a delimiter. In the example above we use the -d option to specify the comma as our delimiter character. We have also used the -f option to specify that we want to extract the second field (column). This gives the following output: **OUTPUT** ``` deer rabbit raccoon rabbit deer fox rabbit bear ``` The uniq command filters out adjacent matching lines in a file. How could you extend this pipeline (using uniq and another command) to find out what animals the file contains (without any duplicates in their names)? ## Episode 4 Exercise 7 Which Pipe? The file `animals.csv` contains 8 lines of data formatted as follows: OUTPUT ``` 2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 ```` The uniq command has a -c option which gives a count of the number of times a line occurs in its input. Assuming your current directory is shell-lesson-data/exercise-data/animal-counts, what command would you use to produce a table that shows the total count of each type of animal in the file? ``` sort animals.csv | uniq -c sort -t, -k2,2 animals.csv | uniq -c cut -d, -f 2 animals.csv | uniq -c cut -d, -f 2 animals.csv | sort | uniq -c cut -d, -f 2 animals.csv | sort | uniq -c | wc -l ``` ## Episode 5 Exercise 3 Limiting Sets of Files What would be the output of running the following loop in the shell-lesson-data/molecules directory? ``` > for filename in c* > do > ls $filename > done` ``` 1. No files are listed. 1. All files are listed. 1. Only `cubane.pdb`, `octane.pdb` and `pentane.pdb` are listed. 1. Only `cubane.pdb` is listed. ```cubane.pdb lengths.txt octane.pdb propane.pdb ethane.pdb methane.pdb pentane.pdb sorted-lengths.txt ``` How would the output differ from using this command instead? ``` > for filename in *c* > do > ls $filename > done ``` The same files would be listed. All the files are listed this time. No files are listed this time. The files `cubane.pdb` and `octane.pdb` will be listed. Only the file `octane.pdb` will be listed. ## Nested loop ``` for fn1 in `ls` do echo ---- echo $fn1 echo ---- for fn2 in `ls $fn1` do echo $fn2 done done ``` ## 7.2 Tracking a Species Leah has several hundred data files saved in one directory, each of which is formatted like this: ``` 2013-11-05,deer,5 2013-11-05,rabbit,22 2013-11-05,raccoon,7 2013-11-06,rabbit,19 2013-11-06,deer,2 ``` She wants to write a shell script that takes a species as the first command-line argument and a directory as the second argument. The script should return one file called `species.txt` containing a list of dates and the number of that species seen on each date. For example using the data shown above, `rabbit.txt` would contain: ``` 2013-11-05,22 2013-11-06,19 ``` Put these commands and pipes in the right order to achieve this: ``` cut -d : -f 2 > | grep -w $1 -r $2 | $1.txt cut -d , -f 1,3 ``` Hint: Use `man grep` to look for how to grep text recursively in a directory and `man cut` to select more than one field in a line. An example of such a file is provided in `shell-lesson-data/data/animal-counts/animals.txt`.