<style>body { background-color: #eeeeee!important; } </style> # NWO-I Software Carpentry 2022 :::info :information_source: On this page you will find notes for the NWO-I Software Carpentry workshop organized on November 28 and December 13. ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :timer_clock: Schedule November 28 | | **Unix Shell**| |------|------| | 09:30 | Navigating and working with files and directories | | 10:30 | Morning break | | 10:45 | Automation (pipes, filters, loops & scripts) | | 12:30 | Lunch break | | 13:15 | Finding things | | 14:00 | *END* | | | **Git** | | 14:15 | Setting up and working with Git | | 15:45 | Afternoon break | | 16:00 | Collaborating via Git | | 17:30 | *END* | ## Unix shell ### :link: Links * Setup page: https://swcarpentry.github.io/shell-novice/setup.html * Lesson material: https://swcarpentry.github.io/shell-novice/ * Reference page: https://swcarpentry.github.io/shell-novice/reference.html ![](https://i.imgur.com/XpI1s82.png) ![](https://i.imgur.com/ULDLRCF.png) ### 1. Introducing the Shell ![](https://i.imgur.com/hmTs0TM.png) ![](https://i.imgur.com/pUuvYlr.png) ``` $ ls $ ``` ### 2. Navigating Files and Directories ``` $ pwd ``` ![](https://i.imgur.com/jvHnHCx.png) ``` $ ls $ ls -F $ clear $ ls --help $ man ls $ ls -j (-j invalid option) ``` :::success :pencil: **Exploring More `ls` Flags** You can also use two options at the same time. What does the command `ls` do when used with the `-l` option? What about if you use both the `-l` and the `-h` option? Some of its output is about properties that we do not cover in this lesson (such as file permissions and ownership), but the rest should be useful nevertheless. :::spoiler :eyes: ***Solution*** The `-l` option makes ls use a long listing format, showing not only the file/directory names but also additional information, such as the file size and the time of its last modification. If you use both the `-h` option and the `-l` option, this makes the file size ‘human readable’, i.e. displaying something like `5.3K` instead of `5369`. ::: :::success :pencil: ***Optional:* Listing in Reverse Chronological Order** By default, `ls` lists the contents of a directory in alphabetical order by name. The command `ls -t` lists items by time of last change instead of alphabetically. The command ls `-r` lists the contents of a directory in reverse order. Which file is displayed last when you combine the `-t` and `-r` options? Hint: You may need to use the `-l` option to see the last changed dates. :::spoiler :eyes: ***Solution*** The most recently changed file is listed last when using `-rt`. This can be very useful for finding your most recent edits or checking to see if a new output file was written. ::: ``` $ ls -l $ ls -h $ ls -lh $ ls -t $ ls -lt $ ls -lrt $ ls -F Desktop $ ls -F Desktop/shell-lesson-data $ clear $ cd Desktop $ ls $ cd shell-lesson-data $ cd exercise-data $ pwd $ ls -F $ clear $ cd shell-lesson-data (error) $ cd .. $ pwd $ ls -F $ ls -F -a $ ls -Fa $ cd $ pwd $ cd Desktop/shell-lesson-data/exercise-data $ pwd $ ls -F ``` :::success :pencil: **Absolute vs Relative Paths** Starting from `/Users/amanda/data`, which of the following commands could Amanda use to navigate to her home directory, which is `/Users/amanda`? 1. `cd .` 1. `cd /` 1. `cd /home/amanda` 1. `cd ../..` 1. `cd ~` 1. `cd home` 1. `cd ~/data/..` 1. `cd` 1. `cd ..` :::spoiler :eyes: ***Solution*** 1. No: `.` stands for the current directory. 1. No: `/` stands for the root directory. 1. No: Amanda’s home directory is `/Users/amanda`. 1. No: this command goes up two levels, i.e. ends in `/Users`. 1. Yes: `~` stands for the user’s home directory, in this case `/Users/amanda`. 1. No: this command would navigate into a directory home in the current directory if it exists. 1. Yes: unnecessarily complicated, but correct. 1. Yes: shortcut to go back to the user’s home directory. 1. Yes: goes up one level. ::: ![](https://i.imgur.com/2ZuhlAs.png) ``` $ clear $ cd ~/ $ cd ~/Desktop/shell-lesson-data $ ls -s exercise-data $ ls -S exercise-data $ ls -F / $ ls north-pacific-gyre $ ls nor[tab] (autocomplete) $ ls nor[tab]/g[tab] [tab] $ ls north-pacific-gyre/goodiff.sh ``` ### 3. Working With Files and Directories ``` $ pwd $ cd exercise-data/writing $ ls -F $ mkdir thesis $ ls -F $ mkdir -p ../project/data ../project/results $ ls -FR ../project $ cd thesis $ ls -F $ nano draft.txt $ ls $ cat draft.txt $ cd ~/Desktop/shell-lesson-data/exercise-data/writing $ mv thesis/draft.txt thesis/quotes.txt $ ls thesis $ mv thesis/quotes.txt . $ ls thesis $ ls thesis/quotes.txt $ ls quotes.txt $ ls -F $ cp quotes.txt thesis/quotations.txt $ ls thesis $ ls $ cp -r thesis thesis_backup $ ls thesis thesis_backup $ rm quotes.txt $ ls $ rm -i haiku.txt $ rm thesis $ rm -r thesis ``` <!-- :::success :pencil: **Moving Files to a new folder** After running the following commands, Jamie realizes that she put the files `sucrose.dat` and `maltose.dat` into the wrong folder. The files should have been placed in the `raw` folder. ``` $ ls -F analyzed/ raw/ $ ls -F analyzed fructose.dat glucose.dat maltose.dat sucrose.dat $ cd analyzed ``` Fill in the blanks to move these files to the `raw/` folder (i.e. the one she forgot to put them in) ``` $ mv sucrose.dat maltose.dat ____/____ ``` <!-- :::spoiler :eyes: ***Solution*** ``` $ mv sucrose.dat maltose.dat ../raw ``` Recall that `..` refers to the parent directory (i.e. one above the current directory) and that `.` refers to the current directory. ::: <!-- :::success :pencil: **Renaming Files** Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: `statstics.txt` After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so? 1. `cp statstics.txt statistics.txt` 1. `mv statstics.txt statistics.txt` 1. `mv statstics.txt .` 1. `cp statstics.txt .` <!-- :::spoiler :eyes: ***Solution*** 1. No. While this would create a file with the correct name, the incorrectly named file still exists in the directory and would need to be deleted. 1. Yes, this would work to rename the file. 1. No, the period(.) indicates where to move the file, but does not provide a new file name; identical file names cannot be created. 1. No, the period(.) indicates where to copy the file, but does not provide a new file name; identical file names cannot be created. ::: <!-- :::success :pencil: **Copy with Multiple Filenames** For this exercise, you can test the commands in the `shell-lesson-data/exercise-data` directory. In the example below, what does `cp` do when given several filenames and a directory name? ``` $ mkdir backup $ cp creatures/minotaur.dat creatures/unicorn.dat backup/ ``` In the example below, what does `cp` do when given three or more file names? ``` $ cd creatures $ ls -F basilisk.dat minotaur.dat unicorn.dat $ cp minotaur.dat unicorn.dat basilisk.dat ``` <!-- :::spoiler :eyes: ***Solution*** If given more than one file name followed by a directory name (i.e. the destination directory must be the last argument), `cp` copies the files to the named directory. If given three file names, `cp` throws an error such as the one below, because it is expecting a directory name as the last argument. ``` cp: target 'basilisk.dat' is not a directory ``` ::: --> ### 4. Pipes and Filters ``` $ cd exercise-data/ $ ls proteins $ cd proteins $ wc cubane.pdb $ wc *.pdb $ wc -l *.pdb $ wc -l ctrl-C $ wc -l *.pdb > lengths.txt $ ls $ cat lengths.txt $ less lengths.txt q ``` :::success :pencil: **What Does `sort -n` Do?** The file `shell-lesson-data/exercise-data/numbers.txt` contains the following lines: ``` 10 2 19 22 6 ``` If we run sort on this file, the output is: ``` 10 19 2 22 6 ``` If we run `sort -n` on the same file, we get this instead: ``` 2 6 10 19 22 ``` Explain why `-n` has this effect. :::spoiler :eyes: ***Solution*** The `-n` option specifies a numerical rather than an alphanumerical sort. ::: ``` $ sort lengths.txt $ sort -n lengths.txt $ cat numbers.txt $ sort numbers.txt $ sort -n numbers.txt $ cd proteins $ ls $ sort -n lengths.txt > sorted-lengths.txt $ head -n 1 sorted-lengths.txt ``` :::success :pencil: **What Does `>>` Mean?** We have seen the use of `>`, but there is a similar operator `>>` which works slightly differently. We’ll learn about the differences between these two operators by printing some strings. We can use the echo command to print strings e.g. ``` $ echo The echo command prints text The echo command prints text ``` Now test the commands below to reveal the difference between the two operators: ``` $ echo hello > testfile01.txt ``` and: ``` $ echo hello >> testfile02.txt ``` Hint: Try executing each command twice in a row and then examining the output files. :::spoiler :eyes: ***Solution*** In the first example with `>`, the string ‘hello’ is written to `testfile01.txt`, but the file gets overwritten each time we run the command. We see from the second example that the `>>` operator also writes ‘hello’ to a file (in this case `testfile02.txt`), but appends the string to the file if it already exists (i.e. when we run it for the second time). ::: ``` $ sort -n lengths.txt >> sorted-lengths.txt $ cat sorted-lengths.txt $ sort -n lengths.txt | head -n 1 $ head -n 1 sorted-lengths.txt $ wc -l *.pdb | sort -n $ wc -l *.pdb | sort -n | head -n 1 > test.txt ``` ![](https://i.imgur.com/7FFrAeB.png) <!-- :::success :pencil: **Pipe Reading Comprehension** A file called `animals.csv` (in the `shell-lesson-data/exercise-data/animal-counts` folder) contains the following data: ``` 2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-06,fox,4 2012-11-07,rabbit,16 2012-11-07,bear,1 ``` What text passes through each of the pipes and the final redirect in the pipeline below? Note, the `sort -r` command sorts in reverse order. ``` $ cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt ``` Hint: build the pipeline up one command at a time to test your understanding <!-- :::spoiler :eyes: ***Solution*** The head command extracts the first 5 lines from `animals.csv`. Then, the last 3 lines are extracted from the previous 5 by using the `tail` command. With the `sort -r` command those 3 lines are sorted in reverse order and finally, the output is redirected to a file `final.txt`. The content of this file can be checked by executing `cat final.txt`. The file should contain the following lines: ``` 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-05,raccoon,7 ``` ::: --> :::success :pencil: **Pipe Construction** For the file `animals.csv` from the previous exercise, consider the following command: ``` $ cut -d , -f 2 animals.csv ``` The `cut` command is used to remove or ‘cut out’ certain sections of each line in the file, and cut expects the lines to be separated into columns by a `Tab` character. A character used in this way is a called a **delimiter**. In the example above we use the `-d` option to specify the comma as our delimiter character. We have also used the `-f` option to specify that we want to extract the second field (column). This gives the following output: ``` deer rabbit raccoon rabbit deer fox rabbit bear ``` The uniq command filters out adjacent matching lines in a file. How could you extend this pipeline (using uniq and another command) to find out what animals the file contains (without any duplicates in their names)? :::spoiler :eyes: ***Solution*** ``` $ cut -d , -f 2 animals.csv | sort | uniq ``` ::: ``` $ cd north-pacific-gyre/ $ ls $ wc -l *.txt $ wc -l *.txt | sort -n | head -n 5 $ wc -l *.txt | sort -n | tail -n 5 $ ls *Z.txt ``` ### 5. Loops ``` $ cd exercise-data/ $ cd creatures/ $ head -n 5 basilisk.dat minotaur.dat unicorn.dat PSEUDO CODE: for thing in list_of_things do operation done $ for filename in basilisk.dat minotaur.dat unicorn.dat > do > head -n 2 $filename | tail -n 1 > done ``` :::success :pencil: **Write your own loop** How would you write a loop that echoes all 10 numbers from 0 to 9? :::spoiler :eyes: ***Solution*** ``` $ for loop_variable in 0 1 2 3 4 5 6 7 8 9 > do > echo $loop_variable > done ``` ``` 0 1 2 3 4 5 6 7 8 9 ``` ::: :::success :pencil: **Variables in Loops** This exercise refers to the `shell-lesson-data/exercise-data/proteins` directory. `ls *.pdb` gives the following output: ``` cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb ``` What is the output of the following code? ``` $ for datafile in *.pdb > do > ls *.pdb > done ``` Now, what is the output of the following code? ``` $ for datafile in *.pdb > do > ls $datafile > done ``` Why do these two loops give different outputs? :::spoiler :eyes: ***Solution*** The first code block gives the same output on each iteration through the loop. Bash expands the wildcard `*.pdb` within the loop body (as well as before the loop starts) to match all files ending in `.pdb` and then lists them using ls. The expanded loop would look like this: ``` $ for datafile in cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb > do > ls cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb > done ``` ``` cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb ``` The second code block lists a different file on each loop iteration. The value of the datafile variable is evaluated using `$datafile`, and then listed using `ls`. ``` cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb ``` ::: ``` $ for filename in *.dat > do > echo $filename > head -n 100 $filename | tail -n 20 > done $ for in filename in red dragon.dat purple unicorn.dat > do > head -n 100 $filename | tail -n 20 > done $ for filename in "red dragon.dat" "purple unicorn.dat" > do > head -n 100 "$filename" | tail -n 20 > done $ cp basilisk.dat minotaur.dat unicorn.dat *-original $ for filename in *.dat > do > cp $filename original-$filename > done ``` <!-- :::success :pencil: **Saving to a File in a Loop - Part One** In the `shell-lesson-data/exercise-data/proteins` directory, what is the effect of this loop? ``` for alkanes in *.pdb do echo $alkanes cat $alkanes > alkanes.pdb done ``` 1. Prints `cubane.pdb`, `ethane.pdb`, `methane.pdb`, `octane.pdb`, `pentane.pdb` and `propane.pdb`, and the text from `propane.pdb` will be saved to a file called `alkanes.pdb`. 1. Prints `cubane.pdb`, `ethane.pdb`, and `methane.pdb`, and the text from all three files would be concatenated and saved to a file called `alkanes.pdb`. 1. Prints `cubane.pdb`, `ethane.pdb`, `methane.pdb`, `octane.pdb`, and `pentane.pdb`, and the text from `propane.pdb` will be saved to a file called `alkanes.pdb`. 1. None of the above. <!-- :::spoiler :eyes: ***Solution*** 1. The text from each file in turn gets written to the `alkanes.pdb` file. However, the file gets overwritten on each loop iteration, so the final content of `alkanes.pdb` is the text from the `propane.pdb` file. ::: <!-- :::success :pencil: ***Optional:* Saving to a File in a Loop - Part Two** Also in the `shell-lesson-data/exercise-data/proteins` directory, what would be the output of the following loop? ``` for datafile in *.pdb do cat $datafile >> all.pdb done ``` 1. All of the text from `cubane.pdb`, `ethane.pdb`, `methane.pdb`, `octane.pdb`, and `pentane.pdb` would be concatenated and saved to a file called `all.pdb`. 1. The text from `ethane.pdb` will be saved to a file called `all.pdb`. 1. All of the text from `cubane.pdb`, `ethane.pdb`, `methane.pdb`, `octane.pdb`, `pentane.pdb` and `propane.pdb` would be concatenated and saved to a file called `all.pdb`. 1. All of the text from `cubane.pdb`, `ethane.pdb`, `methane.pdb`, `octane.pdb`, `pentane.pdb` and `propane.pdb` would be printed to the screen and saved to a file called `all.pdb`. <!-- :::spoiler :eyes: ***Solution*** 3 is the correct answer. ``>>`` appends to a file, rather than overwriting it with the redirected output from a command. Given the output from the `cat` command has been redirected, nothing is printed to the screen. ::: --> ![](https://i.imgur.com/xCAkxes.png) ``` $ ls $ for datafile in NENE*A.txt NENE*B.txt > do > echo $datafile stats-$datafile > done $ for datafile in NENE*A.txt NENE*B.txt > do > bash goostats.sh $datafile stats-$datafile > done $ for datafile in NENE*A.txt NENE*B.txt > do > echo $datafile > bash goostats.sh $datafile stats-$datafile > done ``` ### 6. Shell Scripts ``` $ cd exercise-data/ $ cd proteins/ $ nano middle.sh > head -n 15 octane.pdb | tail -n 5 ctrl-X $ bash middle.sh $ nano middle.sh > head -n 15 "$1" | tail -n 5 ctrl-X $ bash middle.sh octane.pdb $ ls $ bash middle.sh pentane.pdb $ nano middle.sh > head -n "$2" "$1" | tail -n "$3" ctrl-X $ bash middle.sh propane.pdb 20 5 $ bash middle.sh propane.pdb 11 5 $ wc -l *.pdb | sort -n $ nano sorted.sh > # Sort files by their length. > # Usage: bash sorted.sh one_or_more_filenames > wc -l "$@" | sort -n ctrl-X $ bash sorted.sh *.pdb ../creatures/*.dat ``` <!-- :::success :pencil: **List Unique Species** Leah has several hundred data files, each of which is formatted like this: ``` 2013-11-05,deer,5 2013-11-05,rabbit,22 2013-11-05,raccoon,7 2013-11-06,rabbit,19 2013-11-06,deer,2 2013-11-06,fox,1 2013-11-07,rabbit,18 2013-11-07,bear,1 ``` An example of this type of file is given in `shell-lesson-data/exercise-data/animal-counts/animals.csv`. We can use the command `cut -d , -f 2 animals.csv | sort | uniq` to produce the unique species in `animals.csv`. In order to avoid having to type out this series of commands every time, a scientist may choose to write a shell script instead. Write a shell script called species.sh that takes any number of filenames as command-line arguments, and uses a variation of the above command to print a list of the unique species appearing in each of those files separately. <!-- :::spoiler :eyes: ***Solution*** ``` # Script to find unique species in csv files where species is the second data field # This script accepts any number of file names as command line arguments # Loop over all files for file in $@ do echo "Unique species in $file:" # Extract species names cut -d , -f 2 $file | sort | uniq done ``` ::: <!-- :::success :pencil: **Variables in Shell Scripts** In the `proteins` directory, imagine you have a shell script called `script.sh` containing the following commands: ``` head -n $2 $1 tail -n $3 $1 ``` While you are in the `proteins` directory, you type the following command: ``` bash script.sh '*.pdb' 1 1 ``` Which of the following outputs would you expect to see? 1. All of the lines between the first and the last lines of each file ending in `.pdb` in the `proteins` directory 1. The first and the last line of each file ending in `.pdb` in the `proteins` directory 1. The first and the last line of each file in the `proteins` directory 1. An error because of the quotes around `*.pdb` <!-- :::spoiler :eyes: ***Solution*** The correct answer is 2. The special variables $1, $2 and $3 represent the command line arguments given to the script, such that the commands run are: ``` $ head -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb $ tail -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb ``` The shell does not expand ``'*.pdb'`` because it is enclosed by quote marks. As such, the first argument to the script is ``'*.pdb'`` which gets expanded within the script by `head` and `tail`. ::: <!-- :::success :pencil: ***Optional:* Find the Longest File With a Given Extension** Write a shell script called `longest.sh` that takes the name of a directory and a filename extension as its arguments, and prints out the name of the file with the most lines in that directory with that extension. For example: ``` $ bash longest.sh shell-lesson-data/exercise-data/proteins pdb ``` would print the name of the `.pdb` file in `shell-lesson-data/exercise-data/proteins` that has the most lines. Feel free to test your script on another directory e.g. ``` $ bash longest.sh shell-lesson-data/exercise-data/writing txt ``` <!-- :::spoiler :eyes: ***Solution*** ``` # Shell script which takes two arguments: # 1. a directory name # 2. a file extension # and prints the name of the file in that directory # with the most lines which matches the file extension. wc -l $1/*.$2 | sort -n | tail -n 2 | head -n 1 ``` The first part of the pipeline, `wc -l $1/*.$2 | sort -n`, counts the lines in each file and sorts them numerically (largest last). When there’s more than one file, `wc` also outputs a final summary line, giving the total number of lines across all files. We use `tail -n 2 | head -n 1` to throw away this last line. With `wc -l $1/*.$2 | sort -n | tail -n 1` we’ll see the final summary line: we can build our pipeline up in pieces to be sure we understand the output. ::: ### 7. Finding Things <!-- :::success :pencil: **Using grep** Which command would result in the following output: ``` and the presence of absence: ``` 1. `grep "of" haiku.txt` 1. `grep -E "of" haiku.txt` 1. `grep -w "of" haiku.txt` 1. `grep -i "of" haiku.txt` <!-- :::spoiler :eyes: ***Solution*** The correct answer is 3, because the `-w` option looks only for whole-word matches. The other options will also match ‘of’ when part of another word. ::: <!-- :::success :pencil: **Tracking a Species** Leah has several hundred data files saved in one directory, each of which is formatted like this: ``` 2012-11-05,deer,5 2012-11-05,rabbit,22 2012-11-05,raccoon,7 2012-11-06,rabbit,19 2012-11-06,deer,2 2012-11-06,fox,4 2012-11-07,rabbit,16 2012-11-07,bear,1 ``` She wants to write a shell script that takes a species as the first command-line argument and a directory as the second argument. The script should return one file called `<species>.txt` containing a list of dates and the number of that species seen on each date. For example using the data shown above, `rabbit.txt` would contain: ``` 2012-11-05,22 2012-11-06,19 2012-11-07,16 ``` Below, each line contains an individual command, or pipe. Arrange their sequence in one command in order to achieve Leah’s goal: ``` cut -d : -f 2 > | grep -w $1 -r $2 | $1.txt cut -d , -f 1,3 ``` Hint: use `man grep` to look for how to grep text recursively in a directory and `man cut` to select more than one field in a line. An example of such a file is provided in `shell-lesson-data/exercise-data/animal-counts/animals.csv` <!-- :::spoiler :eyes: ***Solution*** ``` grep -w $1 -r $2 | cut -d : -f 2 | cut -d , -f 1,3 > $1.txt ``` Actually, you can swap the order of the two cut commands and it still works. At the command line, try changing the order of the cut commands, and have a look at the output from each step to see why this is the case. You would call the script above like this: ``` $ bash count-species.sh bear . ``` ::: <!-- :::success :pencil: ***Optional:* Little Women** You and your friend, having just finished reading Little Women by Louisa May Alcott, are in an argument. Of the four sisters in the book, Jo, Meg, Beth, and Amy, your friend thinks that Jo was the most mentioned. You, however, are certain it was Amy. Luckily, you have a file `LittleWomen.txt` containing the full text of the novel (`shell-lesson-data/exercise-data/writing/LittleWomen.txt`). Using a `for` loop, how would you tabulate the number of times each of the four sisters is mentioned? Hint: one solution might employ the commands `grep` and `wc` and a `|`, while another might utilize `grep` options. There is often more than one way to solve a programming task, so a particular solution is usually chosen based on a combination of yielding the correct result, elegance, readability, and speed. :::spoiler :eyes: ***Solution*** ``` for sis in Jo Meg Beth Amy do echo $sis: grep -ow $sis LittleWomen.txt | wc -l done ``` Alternative, slightly inferior solution: ``` for sis in Jo Meg Beth Amy do echo $sis: grep -ocw $sis LittleWomen.txt done ``` This solution is inferior because `grep -c` only reports the number of lines matched. The total number of matches reported by this method will be lower if there is more than one match per line. Perceptive observers may have noticed that character names sometimes appear in all-uppercase in chapter titles (e.g. ‘MEG GOES TO VANITY FAIR’). If you wanted to count these as well, you could add the `-i` option for case-insensitivity (though in this case, it doesn’t affect the answer to which sister is mentioned most frequently). ::: --> ## Version Control with Git ### :link: Links * Setup page: https://swcarpentry.github.io/git-novice/setup.html * Lesson material: https://swcarpentry.github.io/git-novice/ * Reference page: https://swcarpentry.github.io/git-novice/reference.html * The Turing Way chapter: https://the-turing-way.netlify.app/reproducible-research/vcs.html * List of Git GUIs: https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs ### 1. Automated Version Control ### 2. Setting Up Git ``` $ git $ git config --global user.name "<Name...>" $ git config --global user.email name@something.nl" $ git config --global core.autocrlf input (linux) $ git config --global core.autocrlf false (Windows) $ git config --global core.editor "nano" $ git config --global init.defaultBranch main $ git config --list ``` ### 3. Creating a Repository ![](https://swcarpentry.github.io/git-novice/fig/motivatingexample.png) ### 4. Tracking Changes ``` $ cd $ cd Desktop $ mkdir planets $ cd planets $ ls $ git init $ ls $ ls -a $ ls -aF $ git status ``` ``` $ nano mars.txt $ cat mars.txt $ git status $ git commit -m "Start notes on Mars as a base" $ git add mars.txt $ git status $ git commit -m "Start notes on Mars as a base $ ls $ cat mars.txt $ ls -aF $ git log $ nano mars.txt $ cat mars.txt $ git diff $ git commit -m "Add concerns about effects of Mars' moons on Wolfman" $ git add mars.txt $ git status $ git commit -m "Add concerns about effects of Mars' moons on Wolfman" $ ``` ![](https://swcarpentry.github.io/git-novice/fig/git-staging-area.svg) ![](https://swcarpentry.github.io/git-novice/fig/git-committing.svg) ``` $ nano mars.txt $ git diff $ git add mars.txt $ git diff $ git diff --staged $ git commit -m "Discuss concerns about Mars' climate for Mummy" ``` :::success :pencil: Choosing a Commit Message Which of the following commit messages would be most appropriate for the last commit made to mars.txt? 1. “Changes” 2. “Added line ‘But the Mummy will appreciate the lack of humidity’ to mars.txt” 3. “Discuss effects of Mars’ climate on the Mummy” :::spoiler :eyes: ***Solution*** Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using “git diff” to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative. ::: ``` $ git status $ git log $ git diff --color-words $ git log $ git log -1 $ git log -2 $ git log --oneline $ pwd $ mkdir spaceships $ touch spaceships/apollo-11 $ status $ git add spaceships $ git status $ git commit -m "Add some initial thoughts " ``` :::success :pencil: Committing Changes to Git Which command(s) below would save the changes of `myfile.txt` to my local Git repository? 1. `$ git commit -m "my recent changes"` 2. `$ git init myfile.txt` `$ git commit -m "my recent changes"` 3. `$ git add myfile.txt` `$ git commit -m "my recent changes"` 4. `$ git commit -m myfile.txt "my recent changes"` :::spoiler :eyes: ***Solution*** 1. Would only create a commit if files have already been staged. 2. Would try to create a new repository. 3. Is correct: first add the file to the staging area, then commit. 4. Would try to commit a file “my recent changes” with the message myfile.txt. ::: :::success :pencil: Committing Multiple Files The staging area can hold changes from any number of files that you want to commit as a single snapshot. 1. Add some text to `mars.txt` noting your decision to consider Venus as a base 2. Create a new file `venus.txt` with your initial thoughts about Venus as a base for you and your friends 3. Add changes from both files to the staging area, and commit those changes. ::: ### 5. Exploring history ``` $ nano mars.txt $ cat mars.txt $ git diff HEAD mars.txt $ git diff HEAD~1 mars.txt $ git diff HEAD~2 mars.txt $ git show HEAD~3 mars.txt $ git show HEAD $ git show HEAD .txt $ git log $ git show c428... $ git diff c428... $ git status (restore vs. checkout) $ git restore mars.txt $ cat mars.txt $ git checkout c428... mars.txt $ git status $ cat mars.txt $ git restore --staged mars.txt $ git status $ cat mars.txt $ git restore mars.txt ``` :::success :pencil: Recovering Older Versions of a File Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning "broke" the script and it no longer runs. She has spent ~1hr trying to fix it, with no luck... Luckily, she has been keeping track of her project’s versions using Git! Which commands below will let her recover the last committed version of her Python script called `data_cruncher.py`? 1. `$ git checkout HEAD` 2. `$ git checkout HEAD data_cruncher.py` 3. `$ git checkout HEAD~1 data_cruncher.py` 4. `$ git checkout <unique ID of last commit> data_cruncher.py` 5. Both 2 and 4 :::spoiler :eyes: ***Solution*** The answer is (5)-Both 2 and 4. The checkout command restores files from the repository, overwriting the files in your working directory. Answers 2 and 4 both restore the latest version in the repository of the file data_cruncher.py. Answer 2 uses HEAD to indicate the latest, whereas answer 4 uses the unique ID of the last commit, which is what HEAD means. Answer 3 gets the version of data_cruncher.py from the commit before HEAD, which is NOT what we wanted. Answer 1 does nothing. ::: :::success :pencil: Understanding Workflow and History What is the output of the last command in ``` $ cd planets $ echo "Venus is beautiful and full of love" > venus.txt $ git add venus.txt $ echo "Venus is too hot to be suitable as a base" >> venus.txt $ git commit -m "Comment on Venus as an unsuitable base" $ git checkout HEAD venus.txt $ cat venus.txt #this will print the contents of venus.txt to the screen ``` 1. `Venus is too hot to be suitable as a base` 2. `Venus is beautiful and full of love` 3. `Venus is beautiful and full of love` `Venus is too hot to be suitable as a base` 4. Error because you have changed venus.txt without committing the changes :::spoiler :eyes: ***Solution*** The answer is 2. The command `git add venus.txt` places the current version of `venus.txt` into the staging area. The changes to the file from the second echo command are only applied to the working copy, not the version in the staging area. So, when `git commit -m "Comment on Venus as an unsuitable base"` is executed, the version of `venus.txt` committed to the repository is the one from the staging area and has only one line. At this time, the working copy still has the second line (and `git status` will show that the file is modified). However, `git checkout HEAD venus.txt` replaces the working copy with the most recently committed version of `venus.txt`. So, `cat venus.txt` will output `Venus is beautiful and full of love.` ::: ### 6. Ignoring Things ``` $ mkdir results $ touch a.dat b.dar c.dat results/a.out results/b.out $ git status $ nano .gitignore $ ls $ ls $ git add a.dat $ git add -f a.dat $ git status $ git restore --staged a.dat $ git status --ignored ``` :::success :pencil: Including Specific Files How would you ignore all `.dat` files in your root directory except for `final.dat`? Hint: Find out what `!` (the exclamation point operator) does. ::: :::spoiler :eyes: ***Solution*** You would add the following two lines to your .gitignore: ``` *.dat # ignore all data files !final.dat # except final.data ``` The exclamation point operator will include a previously excluded entry. Note also that because you’ve previously committed `.dat` files in this lesson they will not be ignored with this new rule. Only future additions of `.dat` files added to the root directory will be ignored. ::: ### 7. Remotes in GitHub ![](https://i.imgur.com/ktdZ75W.png) ![](https://i.imgur.com/OSo32S3.png) ``` $ cd Desktop $ cd planets $ ls -a $ git remote add origin <copied address repository> $ git remote -v $ ls -al ~/.ssh $ ssh-keygen -t ed25519 -C "<emailaddress>" $ ls -al ~/.ssh $ ssh -T git@github.com $ cat ~/.ssh/id_ed25519.pub [copy resulting line, go to GitHub] $ ssh -T git@github.com $ git push origin main ``` ![](https://i.imgur.com/T6dGUTq.png) ``` $ git pull origin main ``` ### 8. Collaborating ``` $ git clone git@github.com:<collaborator username>/planets.git ~/Desktop/collaborator-planets ``` ![](https://i.imgur.com/Avh5Uoc.png) ### 9. Conflicts ``` $ cd ~/Desktop/collaborator-planets $ nano pluto.txt $ cat pluto.txt $ git add pluto.txt $ git commit -m "Add notes about Pluto" $ git push origin main (error when called master, do:) $ git push origin master $ cd ~/Desktop/planets $ git pull origin main $ ls $ pwd $ nano mars.txt $ cat mars.txt (owner:) $ git add mars.txt $ git commit -m "Add a line in our home copy" $ git push origin main (collaborator:) $ cd ~/Desktop/collaborator-planets $ git add mars.txt $ git commit -m "Add a line in my copy" $ git push origin main (or master) (error, conflict) ``` ![](https://i.imgur.com/YgKnel1.png) ``` $ cat mars.txt ``` ### 10. Open Science ![](https://i.imgur.com/fcgsupP.png) ### 13. Hosting ![](https://i.imgur.com/Ryapm8u.png) ![](https://i.imgur.com/2qkzdTQ.png) ![](https://i.imgur.com/5f8tdvI.png) ![](https://i.imgur.com/3HVoYQf.png)