<style>body { background-color: #eeeeee!important; } </style> # NWO-I Software Carpentry, DIFFER 2023, Day 1 :::info :information_source: On this page you will find notes for the first day of the NWO-I Software Carpentry workshop at DIFFER organized on December 5. ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :timer_clock: Schedule December 2023 | | **Unix Shell**| |------|------| | 09:30 | Navigating and working with files and directories | | 10:30 | Morning break | | 10:45 | Automation (pipes, filters, loops & scripts) | | 12:30 | Lunch break | | 13:15 | Finding things | | 14:00 | *END* | | | **Git** | | 14:15 | Setting up and working with Git | | 15:45 | Afternoon break | | 16:00 | Collaborating via Git | | 17:30 | *END* | ## Unix shell ### :link: Links * Setup page: https://swcarpentry.github.io/shell-novice/setup.html * Lesson material: https://swcarpentry.github.io/shell-novice/ * Reference page: https://swcarpentry.github.io/shell-novice/reference.html ## Unix Shell ### 1. Introducing the Shell ![Slide4](https://hackmd.io/_uploads/rkboj0nHa.jpg) ![Slide5](https://hackmd.io/_uploads/SJh2iR3Ha.jpg) ``` ls ks ``` ### 2. Navigating Files and Directories ![](https://i.imgur.com/jvHnHCx.png) ``` pwd ls ls -F ls --help man ls ``` :::success :pencil: **Exploring More `ls` Flags** You can also use two options at the same time. What does the command `ls` do when used with the `-l` option? What about if you use both the `-l` and the `-h` option? Some of its output is about properties that we do not cover in this lesson (such as file permissions and ownership), but the rest should be useful nevertheless. :::spoiler :eyes: ***Solution*** The `-l` option makes ls use a long listing format, showing not only the file/directory names but also additional information, such as the file size and the time of its last modification. If you use both the `-h` option and the `-l` option, this makes the file size ‘human readable’, i.e. displaying something like `5.3K` instead of `5369`. ::: ``` ls -l -h ls -lhtr ls -ltr ls -F Desktop ls -F Desktop/shell-lesson-data cd Desktop cd shell-lesson-data cd exercise-data pwd ls -F cd shell-lesson-data cd .. pwd ls -F -a cd pwd cd Desktop/shell-lesson-data/exercise-data pwd cd /Users/reinder/Desktop/shell-lesson-data cd ~/Desktop/shell-lesson-data cd exercise-data/creatures cd - cd - cd - ``` :::success :pencil: **Absolute vs Relative Paths** Starting from `/Users/amanda/data`, which of the following commands could Amanda use to navigate to her home directory, which is `/Users/amanda`? 1. `cd .` 1. `cd /` 1. `cd /home/amanda` 1. `cd ../..` 1. `cd ~` 1. `cd home` 1. `cd ~/data/..` 1. `cd` 1. `cd ..` :::spoiler :eyes: ***Solution*** 1. No: `.` stands for the current directory. 1. No: `/` stands for the root directory. 1. No: Amanda’s home directory is `/Users/amanda`. 1. No: this command goes up two levels, i.e. ends in `/Users`. 1. Yes: `~` stands for the user’s home directory, in this case `/Users/amanda`. 1. No: this command would navigate into a directory home in the current directory if it exists. 1. Yes: unnecessarily complicated, but correct. 1. Yes: shortcut to go back to the user’s home directory. 1. Yes: goes up one level. ::: ![](https://i.imgur.com/2ZuhlAs.png) ``` ls -F / pwd ls north-pacific-gyre ls north-pacific-gyre ls north-pacific-gyre/goostats.sh ls north-pacific-gyre/goodiff.sh ``` ### 3. Working With Files and Directories ``` pwd cd exercise-data/writing ls -F mkdir thesis ls -F ls -F thesis mkdir -p ../project/data ../project/results ls -FR ../project cd thesis nano draft.txt ls cat draft.txt cd ~/Desktop/shell-lesson-data/exercise-data/writing pwd mv thesis/draft.txt thesis/quotes.txt ls thesis mv thesis/quotes.txt . ls thesis ls cp quotes.txt thesis/quotation.txt ls quotes.txt thesis/quotation.txt cp -r thesis thesis_backup ls thesis thesis_backup ``` :::success :pencil: **Renaming Files** Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: `statstics.txt` After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so? 1. `cp statstics.txt statistics.txt` 1. `mv statstics.txt statistics.txt` 1. `mv statstics.txt .` 1. `cp statstics.txt .` :::spoiler :eyes: ***Solution*** 1. No. While this would create a file with the correct name, the incorrectly named file still exists in the directory and would need to be deleted. 1. Yes, this would work to rename the file. 1. No, the period(.) indicates where to move the file, but does not provide a new file name; identical file names cannot be created. 1. No, the period(.) indicates where to copy the file, but does not provide a new file name; identical file names cannot be created. ::: ``` rm quotes.txt ls rm quotes.txt rm thesis rm -r thesis ls rm -ri thesis_backup cd .. cd alkanes ls ls *.pdb ls p*.pdb ls ?ethane.pdb ls ???ane.pdb ``` ### 4. Pipes and Filters ``` cd alkanes ls wc cubane.pdb wc *.pdb wc -l *.pdb wc -l [crl-C] ls wc -l *.pdb > lengths.txt ls cat lengths.txt ``` :::success :pencil: **What Does `sort -n` Do?** The file `shell-lesson-data/exercise-data/numbers.txt` contains the following lines: ``` 10 2 19 22 6 ``` If we run sort on this file, the output is: ``` 10 19 2 22 6 ``` If we run `sort -n` on the same file, we get this instead: ``` 2 6 10 19 22 ``` Explain why `-n` has this effect. :::spoiler :eyes: ***Solution*** The `-n` option specifies a numerical rather than an alphanumerical sort. ::: ``` sort -n lengths.txt > sorted_lengths.txt head -n 1 sorted_lengths.txt ``` :::success :pencil: **What Does `>>` Mean?** We have seen the use of `>`, but there is a similar operator `>>` which works slightly differently. We’ll learn about the differences between these two operators by printing some strings. We can use the echo command to print strings e.g. ``` $ echo The echo command prints text The echo command prints text ``` Now test the commands below to reveal the difference between the two operators: ``` $ echo hello > testfile01.txt ``` and: ``` $ echo hello >> testfile02.txt ``` Hint: Try executing each command twice in a row and then examining the output files. :::spoiler :eyes: ***Solution*** In the first example with `>`, the string ‘hello’ is written to `testfile01.txt`, but the file gets overwritten each time we run the command. We see from the second example that the `>>` operator also writes ‘hello’ to a file (in this case `testfile02.txt`), but appends the string to the file if it already exists (i.e. when we run it for the second time). ::: ``` sort -n lengths.txt | head -n 1 wc -l *.pdb | sort -n wc -l *.pdb | sort -n | head -n 1 ``` ![](https://i.imgur.com/7FFrAeB.png) :::success :pencil: **Pipe Reading Comprehension** A file called `animals.csv` (in the `shell-lesson-data/exercise-data/animal-counts` folder) contains the following data: ``` 2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-06,fox,4 2012-11-07,rabbit,16 2012-11-07,bear,1 ``` What text passes through each of the pipes and the final redirect in the pipeline below? Note, the `sort -r` command sorts in reverse order. ``` $ cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt ``` Hint: build the pipeline up one command at a time to test your understanding :::spoiler :eyes: ***Solution*** The head command extracts the first 5 lines from `animals.csv`. Then, the last 3 lines are extracted from the previous 5 by using the `tail` command. With the `sort -r` command those 3 lines are sorted in reverse order and finally, the output is redirected to a file `final.txt`. The content of this file can be checked by executing `cat final.txt`. The file should contain the following lines: ``` 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-05,raccoon,7 ``` ::: ``` pwd cd ../.. ls cd north-pacific-gyre ls wc -l *.txt wc -l *.txt | sort -n | head -n 5 wc -l *.txt | sort -n | tail -n 5 ls *Z.txt ``` ### 5. Loops ``` cd exercise-data/ cd creatures/ ls head -n 5 basilisk.dat minotaur.dat unicorn.dat for filename in basilisk.dat minotaur.dat unicorn.dat do echo $filename head -n 2 $filename | tail -n 1 done for filename in *.dat do echo $filename head -n 2 $filename | tail -n 1 done for x in *.dat do echo $x head -n 2 $x | tail -n 1 done ``` :::success :pencil: **Write your own loop** How would you write a loop that echoes all 10 numbers from 0 to 9? :::spoiler :eyes: ***Solution*** for filename in *.dat do echo $filename sed -n '1p' $filename done ``` $ for loop_variable in 0 1 2 3 4 5 6 7 8 9 > do > echo $loop_variable > done ``` ``` 0 1 2 3 4 5 6 7 8 9 ``` ::: ``` for filename in *.dat do echo $filename head -n 100 $filename | tail -n 20 done for filename in *.dat do $filename head -n 100 $filename | tail -n 20 done for filename in "red dragon.dat" "purple unicorn.dat" do head -n 100 "$filename" | tail -n 20 done for filename in "red dragon.dat" "purple unicorn.dat" do head -n 100 $filename | tail -n 20 done cp *.dat original-*.dat for filename in *.dat do cp $filename original-$filename done ``` ![](https://i.imgur.com/xCAkxes.png) ``` ls cd ../.. cd north-pacific-gyre for datafile in NENE*A.txt NENE*B.txt do echo $datafile done for datafile in NENE*A.txt NENE*B.txt do echo $datafile stats-$datafile done for datafile in NENE*A.txt NENE*B.txt do bash goostats.sh $datafile stats-$datafile done for datafile in NENE*A.txt NENE*B.txt do echo $datafile bash goostats.sh $datafile stats-$datafile done ``` ### 6. Shell Scripts ``` pwd nano middle.sh head -n 15 octane.pdb | tail -n 5 bash middle.sh nano middle.sh head -n 15 "$1" | tail -n 5 bash middle.sh octane.pdb bash middle.sh cubane.pdb nano middle.sh head -n "$2" "$1" | tail -n "$3" bash middle.sh octane.pdb 15 5 bash middle.sh octane.pdb 20 5 nano middle.sh # select lines from the middle of a file # usage: bash middle.sh filename end_line num_lines head -n "$2" "$1" | tail -n "$3" nano sorted.sh # Sort files by their length. # Usage: bash sorted.sh one_or_more_filenames wc -l "$@" | sort -n bash sorted.sh *.pdb ../creatures/*.dat history !631 !! cd ../../north-pacific-gyre/ nano do-stats.sh # Calculate stats for data files. for datafile in "$@" do echo $datafile bash goostats.sh $datafile stats-$datafile done bash do-stats.sh NENE*A.txt NENE*B.txt bash do-stats.sh NENE*A.txt NENE*B.txt | wc -l ``` :::success :pencil: **List Unique Species** Leah has several hundred data files, each of which is formatted like this: ``` 2013-11-05,deer,5 2013-11-05,rabbit,22 2013-11-05,raccoon,7 2013-11-06,rabbit,19 2013-11-06,deer,2 2013-11-06,fox,1 2013-11-07,rabbit,18 2013-11-07,bear,1 ``` An example of this type of file is given in `shell-lesson-data/exercise-data/animal-counts/animals.csv`. We can use the command `cut -d , -f 2 animals.csv | sort | uniq` to produce the unique species in `animals.csv`. In order to avoid having to type out this series of commands every time, a scientist may choose to write a shell script instead. Write a shell script called species.sh that takes any number of filenames as command-line arguments, and uses a variation of the above command to print a list of the unique species appearing in each of those files separately. :::spoiler :eyes: ***Solution*** ``` # Script to find unique species in csv files where species is the second data field # This script accepts any number of file names as command line arguments # Loop over all files for file in $@ do echo "Unique species in $file:" # Extract species names cut -d , -f 2 $file | sort | uniq done ``` ::: ``` cd cd Desktop/shell-lesson-data/exercise-data/writing ls cat haiku.txt grep not haiku.txt grep The haiku.txt grep -w The haiku.txt grep -w "is not" haiku.txt grep -n "it" haiku.txt grep -nw "the" haiku.txt grep -nwi "the" haiku.txt grep -nwv "the" haiku.txt grep -r Yesterday . grep --help man grep grep -E "^.o" haiku.txt find . find . -type d find . -type f cd .. find . -name *.txt find . -type f find . -name "*.txt" wc -l $(find . -name "*.txt") grep "searching" $(find . -name "*.txt") ``` ## Version Control with Git ### :link: Links * Setup page: https://swcarpentry.github.io/git-novice/setup.html * Lesson material: https://swcarpentry.github.io/git-novice/ * Reference page: https://swcarpentry.github.io/git-novice/reference.html * The Turing Way chapter: https://the-turing-way.netlify.app/reproducible-research/vcs.html * List of Git GUIs: https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs * ### 1. Automated Version Control ![Slide11](https://hackmd.io/_uploads/ryr-2A2S6.jpg) ![Slide12](https://hackmd.io/_uploads/r18f30nr6.jpg) ![Slide13](https://hackmd.io/_uploads/r1rmnA3Bp.jpg) ``` pwd cd ~ git config --global user.name "First_name Last_name" git config --global user.email "your_email@email.nl" git config --global core.autocrlf input git config --global core.editor "nano -w" git config --global init.defaultBranch main git config --global --edit git config --list git config -h git config --help git help ``` ### 3. Creating a Repository ![Slide14](https://hackmd.io/_uploads/rkuVnR3BT.jpg) ``` cd ~/Desktop mkdir planets cd planets git init ls ls -a git checkout -b main git status ``` ### 4. Tracking Changes ![Slide15](https://hackmd.io/_uploads/SkRrh03HT.jpg) ``` pwd nano mars.txt cat mars.txt git status git add mars.txt git status git commit -m "Start notes on Mars as a base" git status git log nano mars.txt cat mars.txt git status git diff git commit -m "Add concerns about effects of Mars' moons on Wolfman" git add mars.txt git commit -m "Add concerns about effects of Mars' moons on Wolfman" nano mars.txt cat mars.txt git diff git add mars.txt git diff git diff --staged git commit -m "Discuss concerns about Mars' climate for Mummy" git status git log git log -1 git log --oneline git log --oneline --graph mkdir spaceships git status git add spaceships git status touch spaceships/apollo-11 spaceships/sputnik-1 git status git add spaceships git status git commit -m "Add some initial thoughts on spaceships" ``` :::success :pencil: **Choosing a Commit Message** Which of the following commit messages would be most appropriate for the last commit made to mars.txt? 1. “Changes” 2. “Added line ‘But the Mummy will appreciate the lack of humidity’ to mars.txt” 3. “Discuss effects of Mars’ climate on the Mummy” :::spoiler :eyes: ***Solution*** Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using “git diff” to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative. ::: ### 5. Exploring History ![Slide16](https://hackmd.io/_uploads/rkm_3RnHp.jpg) ``` cat mars.txt git diff HEAD mars.txt git diff HEAD~1 mars.txt git diff HEAD~3 mars.txt git show HEAD~3 mars.txt git diff 9ee432d21f87ea840abf44f3c1e3bea7c19b1bf9 mars.txt git diff 9ee432d mars.txt git status git checkout HEAD mars.txt cat mars.txt git checkout 9ee432d21f87ea840abf44f3c1e3bea7c19b1bf9 mars.txt cat mars.txt ls git status git checkout HEAD mars.txt cat mars.txt git log --oneline --graph git log ``` ### 6. Ignoring Things ``` mkdir results touch a.csv b.csv c.csv results/a.out results/b.out ls git status nano .gitignore cat .gitignore git status git add .gitignore git commit -m "Ignore data files and the results folder" git status git add a.csv ``` :::success :pencil: **Including Specific Files** How would you ignore all `.csv` files in your root directory except for `final.csv`? Hint: Find out what `!` (the exclamation point operator) does. :::spoiler :eyes: ***Solution*** You would add the following two lines to your .gitignore: ``` *.csv # ignore all data files !final.csv # except final.csv ``` The exclamation point operator will include a previously excluded entry. Note also that because you’ve previously committed `.dat` files in this lesson they will not be ignored with this new rule. Only future additions of `.dat` files added to the root directory will be ignored. ::: ``` ``` ### 7. Remotes in GitHub ![Slide17](https://hackmd.io/_uploads/HJ4DARhSa.jpg) ``` git remote add origin git@github.com:your_user_name/planets.git git remote -v ls -al ~/.ssh ssh-keygen -t ed25519 -C "your_email@email.nl" ls -al ~/.ssh ssh -T git@github.com cat ~/.ssh/id_ed25519.pub ssh -T git@github.com git push origin main git pull origin main ``` ### 8. Collaborating ![Slide21](https://hackmd.io/_uploads/ryE0RR2r6.jpg) ``` git pull origin main ls nano mars.txt git add mars.txt git commit -m "Add a line in my copy" git push origin main git pull origin main cat mars.txt ``` ### 9. Conflicts ![](https://i.imgur.com/YgKnel1.png) ``` nano mars.txt cat mars.txt git add mars.txt git status git commit -m "merge changes from GitHub" git push origin main git log cat mars.txt nano mars.txt git add mars.txt git commit -m "new moon research" git push origin main git pull origin main cat mars.txt git log ``` ![Slide23](https://hackmd.io/_uploads/r1IX1JaSp.jpg) ![Slide24](https://hackmd.io/_uploads/BJk4yypHa.jpg) ![Slide25](https://hackmd.io/_uploads/rJSNJ1TB6.jpg)