NWO-I Software Carpentry 2022 - HackMD

<style>body { background-color: #eeeeee!important; } </style> # NWO-I Software Carpentry 2022 :::info :information_source: On this page you will find notes for the NWO-I Software Carpentry workshop organized on November 28 and December 13. ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :timer_clock: Schedule November 28 | | **Unix Shell**| |------|------| | 09:30 | Navigating and working with files and directories | | 10:30 | Morning break | | 10:45 | Automation (pipes, filters, loops & scripts) | | 12:30 | Lunch break | | 13:15 | Finding things | | 14:00 | *END* | | | **Git** | | 14:15 | Setting up and working with Git | | 15:45 | Afternoon break | | 16:00 | Collaborating via Git | | 17:30 | *END* | ## Unix shell ### :link: Links * Setup page: https://swcarpentry.github.io/shell-novice/setup.html * Lesson material: https://swcarpentry.github.io/shell-novice/ * Reference page: https://swcarpentry.github.io/shell-novice/reference.html ![](https://i.imgur.com/XpI1s82.png) ![](https://i.imgur.com/ULDLRCF.png) ### 1. Introducing the Shell ![](https://i.imgur.com/hmTs0TM.png) ![](https://i.imgur.com/pUuvYlr.png) ``` $ ls $ ``` ### 2. Navigating Files and Directories ``` $ pwd ``` ![](https://i.imgur.com/jvHnHCx.png) ``` $ ls $ ls -F $ clear $ ls --help $ man ls $ ls -j (-j invalid option) ``` :::success :pencil: **Exploring More `ls` Flags** You can also use two options at the same time. What does the command `ls` do when used with the `-l` option? What about if you use both the `-l` and the `-h` option? Some of its output is about properties that we do not cover in this lesson (such as file permissions and ownership), but the rest should be useful nevertheless. :::spoiler :eyes: ***Solution*** The `-l` option makes ls use a long listing format, showing not only the file/directory names but also additional information, such as the file size and the time of its last modification. If you use both the `-h` option and the `-l` option, this makes the file size ‘human readable’, i.e. displaying something like `5.3K` instead of `5369`. ::: :::success :pencil: ***Optional:* Listing in Reverse Chronological Order** By default, `ls` lists the contents of a directory in alphabetical order by name. The command `ls -t` lists items by time of last change instead of alphabetically. The command ls `-r` lists the contents of a directory in reverse order. Which file is displayed last when you combine the `-t` and `-r` options? Hint: You may need to use the `-l` option to see the last changed dates. :::spoiler :eyes: ***Solution*** The most recently changed file is listed last when using `-rt`. This can be very useful for finding your most recent edits or checking to see if a new output file was written. ::: ``` $ ls -l $ ls -h $ ls -lh $ ls -t $ ls -lt $ ls -lrt $ ls -F Desktop $ ls -F Desktop/shell-lesson-data $ clear $ cd Desktop $ ls $ cd shell-lesson-data $ cd exercise-data $ pwd $ ls -F $ clear $ cd shell-lesson-data (error) $ cd .. $ pwd $ ls -F $ ls -F -a $ ls -Fa $ cd $ pwd $ cd Desktop/shell-lesson-data/exercise-data $ pwd $ ls -F ``` :::success :pencil: **Absolute vs Relative Paths** Starting from `/Users/amanda/data`, which of the following commands could Amanda use to navigate to her home directory, which is `/Users/amanda`? 1. `cd .` 1. `cd /` 1. `cd /home/amanda` 1. `cd ../..` 1. `cd ~` 1. `cd home` 1. `cd ~/data/..` 1. `cd` 1. `cd ..` :::spoiler :eyes: ***Solution*** 1. No: `.` stands for the current directory. 1. No: `/` stands for the root directory. 1. No: Amanda’s home directory is `/Users/amanda`. 1. No: this command goes up two levels, i.e. ends in `/Users`. 1. Yes: `~` stands for the user’s home directory, in this case `/Users/amanda`. 1. No: this command would navigate into a directory home in the current directory if it exists. 1. Yes: unnecessarily complicated, but correct. 1. Yes: shortcut to go back to the user’s home directory. 1. Yes: goes up one level. ::: ![](https://i.imgur.com/2ZuhlAs.png) ``` $ clear $ cd ~/ $ cd ~/Desktop/shell-lesson-data $ ls -s exercise-data $ ls -S exercise-data $ ls -F / $ ls north-pacific-gyre $ ls nor[tab] (autocomplete) $ ls nor[tab]/g[tab] [tab] $ ls north-pacific-gyre/goodiff.sh ``` ### 3. Working With Files and Directories ``` $ pwd $ cd exercise-data/writing $ ls -F $ mkdir thesis $ ls -F $ mkdir -p ../project/data ../project/results $ ls -FR ../project $ cd thesis $ ls -F $ nano draft.txt $ ls $ cat draft.txt $ cd ~/Desktop/shell-lesson-data/exercise-data/writing $ mv thesis/draft.txt thesis/quotes.txt $ ls thesis $ mv thesis/quotes.txt . $ ls thesis $ ls thesis/quotes.txt $ ls quotes.txt $ ls -F $ cp quotes.txt thesis/quotations.txt $ ls thesis $ ls $ cp -r thesis thesis_backup $ ls thesis thesis_backup $ rm quotes.txt $ ls $ rm -i haiku.txt $ rm thesis $ rm -r thesis ```  ### 4. Pipes and Filters ``` $ cd exercise-data/ $ ls proteins $ cd proteins $ wc cubane.pdb $ wc *.pdb $ wc -l *.pdb $ wc -l ctrl-C $ wc -l *.pdb > lengths.txt $ ls $ cat lengths.txt $ less lengths.txt q ``` :::success :pencil: **What Does `sort -n` Do?** The file `shell-lesson-data/exercise-data/numbers.txt` contains the following lines: ``` 10 2 19 22 6 ``` If we run sort on this file, the output is: ``` 10 19 2 22 6 ``` If we run `sort -n` on the same file, we get this instead: ``` 2 6 10 19 22 ``` Explain why `-n` has this effect. :::spoiler :eyes: ***Solution*** The `-n` option specifies a numerical rather than an alphanumerical sort. ::: ``` $ sort lengths.txt $ sort -n lengths.txt $ cat numbers.txt $ sort numbers.txt $ sort -n numbers.txt $ cd proteins $ ls $ sort -n lengths.txt > sorted-lengths.txt $ head -n 1 sorted-lengths.txt ``` :::success :pencil: **What Does `>>` Mean?** We have seen the use of `>`, but there is a similar operator `>>` which works slightly differently. We’ll learn about the differences between these two operators by printing some strings. We can use the echo command to print strings e.g. ``` $ echo The echo command prints text The echo command prints text ``` Now test the commands below to reveal the difference between the two operators: ``` $ echo hello > testfile01.txt ``` and: ``` $ echo hello >> testfile02.txt ``` Hint: Try executing each command twice in a row and then examining the output files. :::spoiler :eyes: ***Solution*** In the first example with `>`, the string ‘hello’ is written to `testfile01.txt`, but the file gets overwritten each time we run the command. We see from the second example that the `>>` operator also writes ‘hello’ to a file (in this case `testfile02.txt`), but appends the string to the file if it already exists (i.e. when we run it for the second time). ::: ``` $ sort -n lengths.txt >> sorted-lengths.txt $ cat sorted-lengths.txt $ sort -n lengths.txt | head -n 1 $ head -n 1 sorted-lengths.txt $ wc -l *.pdb | sort -n $ wc -l *.pdb | sort -n | head -n 1 > test.txt ``` ![](https://i.imgur.com/7FFrAeB.png)  :::success :pencil: **Pipe Construction** For the file `animals.csv` from the previous exercise, consider the following command: ``` $ cut -d , -f 2 animals.csv ``` The `cut` command is used to remove or ‘cut out’ certain sections of each line in the file, and cut expects the lines to be separated into columns by a `Tab` character. A character used in this way is a called a **delimiter**. In the example above we use the `-d` option to specify the comma as our delimiter character. We have also used the `-f` option to specify that we want to extract the second field (column). This gives the following output: ``` deer rabbit raccoon rabbit deer fox rabbit bear ``` The uniq command filters out adjacent matching lines in a file. How could you extend this pipeline (using uniq and another command) to find out what animals the file contains (without any duplicates in their names)? :::spoiler :eyes: ***Solution*** ``` $ cut -d , -f 2 animals.csv | sort | uniq ``` ::: ``` $ cd north-pacific-gyre/ $ ls $ wc -l *.txt $ wc -l *.txt | sort -n | head -n 5 $ wc -l *.txt | sort -n | tail -n 5 $ ls *Z.txt ``` ### 5. Loops ``` $ cd exercise-data/ $ cd creatures/ $ head -n 5 basilisk.dat minotaur.dat unicorn.dat PSEUDO CODE: for thing in list_of_things do operation done $ for filename in basilisk.dat minotaur.dat unicorn.dat > do > head -n 2 $filename | tail -n 1 > done ``` :::success :pencil: **Write your own loop** How would you write a loop that echoes all 10 numbers from 0 to 9? :::spoiler :eyes: ***Solution*** ``` $ for loop_variable in 0 1 2 3 4 5 6 7 8 9 > do > echo $loop_variable > done ``` ``` 0 1 2 3 4 5 6 7 8 9 ``` ::: :::success :pencil: **Variables in Loops** This exercise refers to the `shell-lesson-data/exercise-data/proteins` directory. `ls *.pdb` gives the following output: ``` cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb ``` What is the output of the following code? ``` $ for datafile in *.pdb > do > ls *.pdb > done ``` Now, what is the output of the following code? ``` $ for datafile in *.pdb > do > ls $datafile > done ``` Why do these two loops give different outputs? :::spoiler :eyes: ***Solution*** The first code block gives the same output on each iteration through the loop. Bash expands the wildcard `*.pdb` within the loop body (as well as before the loop starts) to match all files ending in `.pdb` and then lists them using ls. The expanded loop would look like this: ``` $ for datafile in cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb > do > ls cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb > done ``` ``` cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb ``` The second code block lists a different file on each loop iteration. The value of the datafile variable is evaluated using `$datafile`, and then listed using `ls`. ``` cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb ``` ::: ``` $ for filename in *.dat > do > echo $filename > head -n 100 $filename | tail -n 20 > done $ for in filename in red dragon.dat purple unicorn.dat > do > head -n 100 $filename | tail -n 20 > done $ for filename in "red dragon.dat" "purple unicorn.dat" > do > head -n 100 "$filename" | tail -n 20 > done $ cp basilisk.dat minotaur.dat unicorn.dat *-original $ for filename in *.dat > do > cp $filename original-$filename > done ```  ![](https://i.imgur.com/xCAkxes.png) ``` $ ls $ for datafile in NENE*A.txt NENE*B.txt > do > echo $datafile stats-$datafile > done $ for datafile in NENE*A.txt NENE*B.txt > do > bash goostats.sh $datafile stats-$datafile > done $ for datafile in NENE*A.txt NENE*B.txt > do > echo $datafile > bash goostats.sh $datafile stats-$datafile > done ``` ### 6. Shell Scripts ``` $ cd exercise-data/ $ cd proteins/ $ nano middle.sh > head -n 15 octane.pdb | tail -n 5 ctrl-X $ bash middle.sh $ nano middle.sh > head -n 15 "$1" | tail -n 5 ctrl-X $ bash middle.sh octane.pdb $ ls $ bash middle.sh pentane.pdb $ nano middle.sh > head -n "$2" "$1" | tail -n "$3" ctrl-X $ bash middle.sh propane.pdb 20 5 $ bash middle.sh propane.pdb 11 5 $ wc -l *.pdb | sort -n $ nano sorted.sh > # Sort files by their length. > # Usage: bash sorted.sh one_or_more_filenames > wc -l "$@" | sort -n ctrl-X $ bash sorted.sh *.pdb ../creatures/*.dat ```  ## Version Control with Git ### :link: Links * Setup page: https://swcarpentry.github.io/git-novice/setup.html * Lesson material: https://swcarpentry.github.io/git-novice/ * Reference page: https://swcarpentry.github.io/git-novice/reference.html * The Turing Way chapter: https://the-turing-way.netlify.app/reproducible-research/vcs.html * List of Git GUIs: https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs ### 1. Automated Version Control ### 2. Setting Up Git ``` $ git $ git config --global user.name "<Name...>" $ git config --global user.email name@something.nl" $ git config --global core.autocrlf input (linux) $ git config --global core.autocrlf false (Windows) $ git config --global core.editor "nano" $ git config --global init.defaultBranch main $ git config --list ``` ### 3. Creating a Repository ![](https://swcarpentry.github.io/git-novice/fig/motivatingexample.png) ### 4. Tracking Changes ``` $ cd $ cd Desktop $ mkdir planets $ cd planets $ ls $ git init $ ls $ ls -a $ ls -aF $ git status ``` ``` $ nano mars.txt $ cat mars.txt $ git status $ git commit -m "Start notes on Mars as a base" $ git add mars.txt $ git status $ git commit -m "Start notes on Mars as a base $ ls $ cat mars.txt $ ls -aF $ git log $ nano mars.txt $ cat mars.txt $ git diff $ git commit -m "Add concerns about effects of Mars' moons on Wolfman" $ git add mars.txt $ git status $ git commit -m "Add concerns about effects of Mars' moons on Wolfman" $ ``` ![](https://swcarpentry.github.io/git-novice/fig/git-staging-area.svg) ![](https://swcarpentry.github.io/git-novice/fig/git-committing.svg) ``` $ nano mars.txt $ git diff $ git add mars.txt $ git diff $ git diff --staged $ git commit -m "Discuss concerns about Mars' climate for Mummy" ``` :::success :pencil: Choosing a Commit Message Which of the following commit messages would be most appropriate for the last commit made to mars.txt? 1. “Changes” 2. “Added line ‘But the Mummy will appreciate the lack of humidity’ to mars.txt” 3. “Discuss effects of Mars’ climate on the Mummy” :::spoiler :eyes: ***Solution*** Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using “git diff” to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative. ::: ``` $ git status $ git log $ git diff --color-words $ git log $ git log -1 $ git log -2 $ git log --oneline $ pwd $ mkdir spaceships $ touch spaceships/apollo-11 $ status $ git add spaceships $ git status $ git commit -m "Add some initial thoughts " ``` :::success :pencil: Committing Changes to Git Which command(s) below would save the changes of `myfile.txt` to my local Git repository? 1. `$ git commit -m "my recent changes"` 2. `$ git init myfile.txt` `$ git commit -m "my recent changes"` 3. `$ git add myfile.txt` `$ git commit -m "my recent changes"` 4. `$ git commit -m myfile.txt "my recent changes"` :::spoiler :eyes: ***Solution*** 1. Would only create a commit if files have already been staged. 2. Would try to create a new repository. 3. Is correct: first add the file to the staging area, then commit. 4. Would try to commit a file “my recent changes” with the message myfile.txt. ::: :::success :pencil: Committing Multiple Files The staging area can hold changes from any number of files that you want to commit as a single snapshot. 1. Add some text to `mars.txt` noting your decision to consider Venus as a base 2. Create a new file `venus.txt` with your initial thoughts about Venus as a base for you and your friends 3. Add changes from both files to the staging area, and commit those changes. ::: ### 5. Exploring history ``` $ nano mars.txt $ cat mars.txt $ git diff HEAD mars.txt $ git diff HEAD~1 mars.txt $ git diff HEAD~2 mars.txt $ git show HEAD~3 mars.txt $ git show HEAD $ git show HEAD .txt $ git log $ git show c428... $ git diff c428... $ git status (restore vs. checkout) $ git restore mars.txt $ cat mars.txt $ git checkout c428... mars.txt $ git status $ cat mars.txt $ git restore --staged mars.txt $ git status $ cat mars.txt $ git restore mars.txt ``` :::success :pencil: Recovering Older Versions of a File Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning "broke" the script and it no longer runs. She has spent ~1hr trying to fix it, with no luck... Luckily, she has been keeping track of her project’s versions using Git! Which commands below will let her recover the last committed version of her Python script called `data_cruncher.py`? 1. `$ git checkout HEAD` 2. `$ git checkout HEAD data_cruncher.py` 3. `$ git checkout HEAD~1 data_cruncher.py` 4. `$ git checkout <unique ID of last commit> data_cruncher.py` 5. Both 2 and 4 :::spoiler :eyes: ***Solution*** The answer is (5)-Both 2 and 4. The checkout command restores files from the repository, overwriting the files in your working directory. Answers 2 and 4 both restore the latest version in the repository of the file data_cruncher.py. Answer 2 uses HEAD to indicate the latest, whereas answer 4 uses the unique ID of the last commit, which is what HEAD means. Answer 3 gets the version of data_cruncher.py from the commit before HEAD, which is NOT what we wanted. Answer 1 does nothing. ::: :::success :pencil: Understanding Workflow and History What is the output of the last command in ``` $ cd planets $ echo "Venus is beautiful and full of love" > venus.txt $ git add venus.txt $ echo "Venus is too hot to be suitable as a base" >> venus.txt $ git commit -m "Comment on Venus as an unsuitable base" $ git checkout HEAD venus.txt $ cat venus.txt #this will print the contents of venus.txt to the screen ``` 1. `Venus is too hot to be suitable as a base` 2. `Venus is beautiful and full of love` 3. `Venus is beautiful and full of love` `Venus is too hot to be suitable as a base` 4. Error because you have changed venus.txt without committing the changes :::spoiler :eyes: ***Solution*** The answer is 2. The command `git add venus.txt` places the current version of `venus.txt` into the staging area. The changes to the file from the second echo command are only applied to the working copy, not the version in the staging area. So, when `git commit -m "Comment on Venus as an unsuitable base"` is executed, the version of `venus.txt` committed to the repository is the one from the staging area and has only one line. At this time, the working copy still has the second line (and `git status` will show that the file is modified). However, `git checkout HEAD venus.txt` replaces the working copy with the most recently committed version of `venus.txt`. So, `cat venus.txt` will output `Venus is beautiful and full of love.` ::: ### 6. Ignoring Things ``` $ mkdir results $ touch a.dat b.dar c.dat results/a.out results/b.out $ git status $ nano .gitignore $ ls $ ls $ git add a.dat $ git add -f a.dat $ git status $ git restore --staged a.dat $ git status --ignored ``` :::success :pencil: Including Specific Files How would you ignore all `.dat` files in your root directory except for `final.dat`? Hint: Find out what `!` (the exclamation point operator) does. ::: :::spoiler :eyes: ***Solution*** You would add the following two lines to your .gitignore: ``` *.dat # ignore all data files !final.dat # except final.data ``` The exclamation point operator will include a previously excluded entry. Note also that because you’ve previously committed `.dat` files in this lesson they will not be ignored with this new rule. Only future additions of `.dat` files added to the root directory will be ignored. ::: ### 7. Remotes in GitHub ![](https://i.imgur.com/ktdZ75W.png) ![](https://i.imgur.com/OSo32S3.png) ``` $ cd Desktop $ cd planets $ ls -a $ git remote add origin <copied address repository> $ git remote -v $ ls -al ~/.ssh $ ssh-keygen -t ed25519 -C "<emailaddress>" $ ls -al ~/.ssh $ ssh -T git@github.com $ cat ~/.ssh/id_ed25519.pub [copy resulting line, go to GitHub] $ ssh -T git@github.com $ git push origin main ``` ![](https://i.imgur.com/T6dGUTq.png) ``` $ git pull origin main ``` ### 8. Collaborating ``` $ git clone git@github.com:<collaborator username>/planets.git ~/Desktop/collaborator-planets ``` ![](https://i.imgur.com/Avh5Uoc.png) ### 9. Conflicts ``` $ cd ~/Desktop/collaborator-planets $ nano pluto.txt $ cat pluto.txt $ git add pluto.txt $ git commit -m "Add notes about Pluto" $ git push origin main (error when called master, do:) $ git push origin master $ cd ~/Desktop/planets $ git pull origin main $ ls $ pwd $ nano mars.txt $ cat mars.txt (owner:) $ git add mars.txt $ git commit -m "Add a line in our home copy" $ git push origin main (collaborator:) $ cd ~/Desktop/collaborator-planets $ git add mars.txt $ git commit -m "Add a line in my copy" $ git push origin main (or master) (error, conflict) ``` ![](https://i.imgur.com/YgKnel1.png) ``` $ cat mars.txt ``` ### 10. Open Science ![](https://i.imgur.com/fcgsupP.png) ### 13. Hosting ![](https://i.imgur.com/Ryapm8u.png) ![](https://i.imgur.com/2qkzdTQ.png) ![](https://i.imgur.com/5f8tdvI.png) ![](https://i.imgur.com/3HVoYQf.png)