<style>body { background-color: #eeeeee!important; } </style>
# NWO-I Software Carpentry, DIFFER 2023, Day 1
:::info
:information_source: On this page you will find notes for the first day of the NWO-I Software Carpentry workshop at DIFFER organized on December 5.
:::
## Code of Conduct
Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed.
## :timer_clock: Schedule December 2023
| | **Unix Shell**|
|------|------|
| 09:30 | Navigating and working with files and directories |
| 10:30 | Morning break |
| 10:45 | Automation (pipes, filters, loops & scripts) |
| 12:30 | Lunch break |
| 13:15 | Finding things |
| 14:00 | *END* |
| | **Git** |
| 14:15 | Setting up and working with Git |
| 15:45 | Afternoon break |
| 16:00 | Collaborating via Git |
| 17:30 | *END* |
## Unix shell
### :link: Links
* Setup page: https://swcarpentry.github.io/shell-novice/setup.html
* Lesson material: https://swcarpentry.github.io/shell-novice/
* Reference page: https://swcarpentry.github.io/shell-novice/reference.html
## Unix Shell
### 1. Introducing the Shell


```
ls
ks
```
### 2. Navigating Files and Directories

```
pwd
ls
ls -F
ls --help
man ls
```
:::success
:pencil: **Exploring More `ls` Flags**
You can also use two options at the same time. What does the command `ls` do when used with the `-l` option?
What about if you use both the `-l` and the `-h` option?
Some of its output is about properties that we do not cover in this lesson (such as file permissions and
ownership), but the rest should be useful nevertheless.
:::spoiler :eyes: ***Solution***
The `-l` option makes ls use a long listing format, showing not only the file/directory names but also
additional information, such as the file size and the time of its last modification. If you use both the
`-h` option and the `-l` option, this makes the file size ‘human readable’, i.e. displaying something like
`5.3K` instead of `5369`.
:::
```
ls -l -h
ls -lhtr
ls -ltr
ls -F Desktop
ls -F Desktop/shell-lesson-data
cd Desktop
cd shell-lesson-data
cd exercise-data
pwd
ls -F
cd shell-lesson-data
cd ..
pwd
ls -F -a
cd
pwd
cd Desktop/shell-lesson-data/exercise-data
pwd
cd /Users/reinder/Desktop/shell-lesson-data
cd ~/Desktop/shell-lesson-data
cd exercise-data/creatures
cd -
cd -
cd -
```
:::success
:pencil: **Absolute vs Relative Paths**
Starting from `/Users/amanda/data`, which of the following commands could Amanda use to navigate to her home
directory, which is `/Users/amanda`?
1. `cd .`
1. `cd /`
1. `cd /home/amanda`
1. `cd ../..`
1. `cd ~`
1. `cd home`
1. `cd ~/data/..`
1. `cd`
1. `cd ..`
:::spoiler :eyes: ***Solution***
1. No: `.` stands for the current directory.
1. No: `/` stands for the root directory.
1. No: Amanda’s home directory is `/Users/amanda`.
1. No: this command goes up two levels, i.e. ends in `/Users`.
1. Yes: `~` stands for the user’s home directory, in this case `/Users/amanda`.
1. No: this command would navigate into a directory home in the current directory if it exists.
1. Yes: unnecessarily complicated, but correct.
1. Yes: shortcut to go back to the user’s home directory.
1. Yes: goes up one level.
:::

```
ls -F /
pwd
ls north-pacific-gyre
ls north-pacific-gyre
ls north-pacific-gyre/goostats.sh
ls north-pacific-gyre/goodiff.sh
```
### 3. Working With Files and Directories
```
pwd
cd exercise-data/writing
ls -F
mkdir thesis
ls -F
ls -F thesis
mkdir -p ../project/data ../project/results
ls -FR ../project
cd thesis
nano draft.txt
ls
cat draft.txt
cd ~/Desktop/shell-lesson-data/exercise-data/writing
pwd
mv thesis/draft.txt thesis/quotes.txt
ls thesis
mv thesis/quotes.txt .
ls thesis
ls
cp quotes.txt thesis/quotation.txt
ls quotes.txt thesis/quotation.txt
cp -r thesis thesis_backup
ls thesis thesis_backup
```
:::success
:pencil: **Renaming Files**
Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: `statstics.txt`
After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?
1. `cp statstics.txt statistics.txt`
1. `mv statstics.txt statistics.txt`
1. `mv statstics.txt .`
1. `cp statstics.txt .`
:::spoiler :eyes: ***Solution***
1. No. While this would create a file with the correct name, the incorrectly named file still exists in the directory and would need to be deleted.
1. Yes, this would work to rename the file.
1. No, the period(.) indicates where to move the file, but does not provide a new file name; identical file names cannot be created.
1. No, the period(.) indicates where to copy the file, but does not provide a new file name; identical file names cannot be created.
:::
```
rm quotes.txt
ls
rm quotes.txt
rm thesis
rm -r thesis
ls
rm -ri thesis_backup
cd ..
cd alkanes
ls
ls *.pdb
ls p*.pdb
ls ?ethane.pdb
ls ???ane.pdb
```
### 4. Pipes and Filters
```
cd alkanes
ls
wc cubane.pdb
wc *.pdb
wc -l *.pdb
wc -l [crl-C]
ls
wc -l *.pdb > lengths.txt
ls
cat lengths.txt
```
:::success
:pencil: **What Does `sort -n` Do?**
The file `shell-lesson-data/exercise-data/numbers.txt` contains the following lines:
```
10
2
19
22
6
```
If we run sort on this file, the output is:
```
10
19
2
22
6
```
If we run `sort -n` on the same file, we get this instead:
```
2
6
10
19
22
```
Explain why `-n` has this effect.
:::spoiler :eyes: ***Solution***
The `-n` option specifies a numerical rather than an alphanumerical sort.
:::
```
sort -n lengths.txt > sorted_lengths.txt
head -n 1 sorted_lengths.txt
```
:::success
:pencil: **What Does `>>` Mean?**
We have seen the use of `>`, but there is a similar operator `>>` which works slightly differently. We’ll learn about the differences between these two operators by printing some strings. We can use the echo command to print strings e.g.
```
$ echo The echo command prints text
The echo command prints text
```
Now test the commands below to reveal the difference between the two operators:
```
$ echo hello > testfile01.txt
```
and:
```
$ echo hello >> testfile02.txt
```
Hint: Try executing each command twice in a row and then examining the output files.
:::spoiler :eyes: ***Solution***
In the first example with `>`, the string ‘hello’ is written to `testfile01.txt`, but the file gets overwritten each time we run the command.
We see from the second example that the `>>` operator also writes ‘hello’ to a file (in this case `testfile02.txt`), but appends the string to the file if it already exists (i.e. when we run it for the second time).
:::
```
sort -n lengths.txt | head -n 1
wc -l *.pdb | sort -n
wc -l *.pdb | sort -n | head -n 1
```

:::success
:pencil: **Pipe Reading Comprehension**
A file called `animals.csv` (in the `shell-lesson-data/exercise-data/animal-counts` folder) contains the following data:
```
2012-11-05,deer,5
2012-11-05,rabbit,22
2012-11-05,raccoon,7
2012-11-06,rabbit,19
2012-11-06,deer,2
2012-11-06,fox,4
2012-11-07,rabbit,16
2012-11-07,bear,1
```
What text passes through each of the pipes and the final redirect in the pipeline below? Note, the `sort -r` command sorts in reverse order.
```
$ cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt
```
Hint: build the pipeline up one command at a time to test your understanding
:::spoiler :eyes: ***Solution***
The head command extracts the first 5 lines from `animals.csv`. Then, the last 3 lines are extracted from the previous 5 by using the `tail` command. With the `sort -r` command those 3 lines are sorted in reverse order and finally, the output is redirected to a file `final.txt`. The content of this file can be checked by executing `cat final.txt`. The file should contain the following lines:
```
2012-11-06,rabbit,19
2012-11-06,deer,2
2012-11-05,raccoon,7
```
:::
```
pwd
cd ../..
ls
cd north-pacific-gyre
ls
wc -l *.txt
wc -l *.txt | sort -n | head -n 5
wc -l *.txt | sort -n | tail -n 5
ls *Z.txt
```
### 5. Loops
```
cd exercise-data/
cd creatures/
ls
head -n 5 basilisk.dat minotaur.dat unicorn.dat
for filename in basilisk.dat minotaur.dat unicorn.dat
do
echo $filename
head -n 2 $filename | tail -n 1
done
for filename in *.dat
do
echo $filename
head -n 2 $filename | tail -n 1
done
for x in *.dat
do
echo $x
head -n 2 $x | tail -n 1
done
```
:::success
:pencil: **Write your own loop**
How would you write a loop that echoes all 10 numbers from 0 to 9?
:::spoiler :eyes: ***Solution***
for filename in *.dat
do
echo $filename
sed -n '1p' $filename
done
```
$ for loop_variable in 0 1 2 3 4 5 6 7 8 9
> do
> echo $loop_variable
> done
```
```
0
1
2
3
4
5
6
7
8
9
```
:::
```
for filename in *.dat
do
echo $filename
head -n 100 $filename | tail -n 20
done
for filename in *.dat
do
$filename
head -n 100 $filename | tail -n 20
done
for filename in "red dragon.dat" "purple unicorn.dat"
do
head -n 100 "$filename" | tail -n 20
done
for filename in "red dragon.dat" "purple unicorn.dat"
do
head -n 100 $filename | tail -n 20
done
cp *.dat original-*.dat
for filename in *.dat
do
cp $filename original-$filename
done
```

```
ls
cd ../..
cd north-pacific-gyre
for datafile in NENE*A.txt NENE*B.txt
do
echo $datafile
done
for datafile in NENE*A.txt NENE*B.txt
do
echo $datafile stats-$datafile
done
for datafile in NENE*A.txt NENE*B.txt
do
bash goostats.sh $datafile stats-$datafile
done
for datafile in NENE*A.txt NENE*B.txt
do
echo $datafile
bash goostats.sh $datafile stats-$datafile
done
```
### 6. Shell Scripts
```
pwd
nano middle.sh
head -n 15 octane.pdb | tail -n 5
bash middle.sh
nano middle.sh
head -n 15 "$1" | tail -n 5
bash middle.sh octane.pdb
bash middle.sh cubane.pdb
nano middle.sh
head -n "$2" "$1" | tail -n "$3"
bash middle.sh octane.pdb 15 5
bash middle.sh octane.pdb 20 5
nano middle.sh
# select lines from the middle of a file
# usage: bash middle.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"
nano sorted.sh
# Sort files by their length.
# Usage: bash sorted.sh one_or_more_filenames
wc -l "$@" | sort -n
bash sorted.sh *.pdb ../creatures/*.dat
history
!631
!!
cd ../../north-pacific-gyre/
nano do-stats.sh
# Calculate stats for data files.
for datafile in "$@"
do
echo $datafile
bash goostats.sh $datafile stats-$datafile
done
bash do-stats.sh NENE*A.txt NENE*B.txt
bash do-stats.sh NENE*A.txt NENE*B.txt | wc -l
```
:::success
:pencil: **List Unique Species**
Leah has several hundred data files, each of which is formatted like this:
```
2013-11-05,deer,5
2013-11-05,rabbit,22
2013-11-05,raccoon,7
2013-11-06,rabbit,19
2013-11-06,deer,2
2013-11-06,fox,1
2013-11-07,rabbit,18
2013-11-07,bear,1
```
An example of this type of file is given in `shell-lesson-data/exercise-data/animal-counts/animals.csv`.
We can use the command `cut -d , -f 2 animals.csv | sort | uniq` to produce the unique species in `animals.csv`. In order to avoid having to type out this series of commands every time, a scientist may choose to write a shell script instead.
Write a shell script called species.sh that takes any number of filenames as command-line arguments, and uses a variation of the above command to print a list of the unique species appearing in each of those files separately.
:::spoiler :eyes: ***Solution***
```
# Script to find unique species in csv files where species is the second data field
# This script accepts any number of file names as command line arguments
# Loop over all files
for file in $@
do
echo "Unique species in $file:"
# Extract species names
cut -d , -f 2 $file | sort | uniq
done
```
:::
```
cd
cd Desktop/shell-lesson-data/exercise-data/writing
ls
cat haiku.txt
grep not haiku.txt
grep The haiku.txt
grep -w The haiku.txt
grep -w "is not" haiku.txt
grep -n "it" haiku.txt
grep -nw "the" haiku.txt
grep -nwi "the" haiku.txt
grep -nwv "the" haiku.txt
grep -r Yesterday .
grep --help
man grep
grep -E "^.o" haiku.txt
find .
find . -type d
find . -type f
cd ..
find . -name *.txt
find . -type f
find . -name "*.txt"
wc -l $(find . -name "*.txt")
grep "searching" $(find . -name "*.txt")
```
## Version Control with Git
### :link: Links
* Setup page: https://swcarpentry.github.io/git-novice/setup.html
* Lesson material: https://swcarpentry.github.io/git-novice/
* Reference page: https://swcarpentry.github.io/git-novice/reference.html
* The Turing Way chapter: https://the-turing-way.netlify.app/reproducible-research/vcs.html
* List of Git GUIs: https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs
*
### 1. Automated Version Control



```
pwd
cd ~
git config --global user.name "First_name Last_name"
git config --global user.email "your_email@email.nl"
git config --global core.autocrlf input
git config --global core.editor "nano -w"
git config --global init.defaultBranch main
git config --global --edit
git config --list
git config -h
git config --help
git help
```
### 3. Creating a Repository

```
cd ~/Desktop
mkdir planets
cd planets
git init
ls
ls -a
git checkout -b main
git status
```
### 4. Tracking Changes

```
pwd
nano mars.txt
cat mars.txt
git status
git add mars.txt
git status
git commit -m "Start notes on Mars as a base"
git status
git log
nano mars.txt
cat mars.txt
git status
git diff
git commit -m "Add concerns about effects of Mars' moons on Wolfman"
git add mars.txt
git commit -m "Add concerns about effects of Mars' moons on Wolfman"
nano mars.txt
cat mars.txt
git diff
git add mars.txt
git diff
git diff --staged
git commit -m "Discuss concerns about Mars' climate for Mummy"
git status
git log
git log -1
git log --oneline
git log --oneline --graph
mkdir spaceships
git status
git add spaceships
git status
touch spaceships/apollo-11 spaceships/sputnik-1
git status
git add spaceships
git status
git commit -m "Add some initial thoughts on spaceships"
```
:::success
:pencil: **Choosing a Commit Message**
Which of the following commit messages would be most appropriate for the last commit made to mars.txt?
1. “Changes”
2. “Added line ‘But the Mummy will appreciate the lack of humidity’ to mars.txt”
3. “Discuss effects of Mars’ climate on the Mummy”
:::spoiler :eyes: ***Solution***
Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using “git diff” to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative.
:::
### 5. Exploring History

```
cat mars.txt
git diff HEAD mars.txt
git diff HEAD~1 mars.txt
git diff HEAD~3 mars.txt
git show HEAD~3 mars.txt
git diff 9ee432d21f87ea840abf44f3c1e3bea7c19b1bf9 mars.txt
git diff 9ee432d mars.txt
git status
git checkout HEAD mars.txt
cat mars.txt
git checkout 9ee432d21f87ea840abf44f3c1e3bea7c19b1bf9 mars.txt
cat mars.txt
ls
git status
git checkout HEAD mars.txt
cat mars.txt
git log --oneline --graph
git log
```
### 6. Ignoring Things
```
mkdir results
touch a.csv b.csv c.csv results/a.out results/b.out
ls
git status
nano .gitignore
cat .gitignore
git status
git add .gitignore
git commit -m "Ignore data files and the results folder"
git status
git add a.csv
```
:::success
:pencil: **Including Specific Files**
How would you ignore all `.csv` files in your root directory except for `final.csv`? Hint: Find out what `!` (the exclamation point operator) does.
:::spoiler :eyes: ***Solution***
You would add the following two lines to your .gitignore:
```
*.csv # ignore all data files
!final.csv # except final.csv
```
The exclamation point operator will include a previously excluded entry.
Note also that because you’ve previously committed `.dat` files in this lesson they will not be ignored with this new rule. Only future additions of `.dat` files added to the root directory will be ignored.
:::
```
```
### 7. Remotes in GitHub

```
git remote add origin git@github.com:your_user_name/planets.git
git remote -v
ls -al ~/.ssh
ssh-keygen -t ed25519 -C "your_email@email.nl"
ls -al ~/.ssh
ssh -T git@github.com
cat ~/.ssh/id_ed25519.pub
ssh -T git@github.com
git push origin main
git pull origin main
```
### 8. Collaborating

```
git pull origin main
ls
nano mars.txt
git add mars.txt
git commit -m "Add a line in my copy"
git push origin main
git pull origin main
cat mars.txt
```
### 9. Conflicts

```
nano mars.txt
cat mars.txt
git add mars.txt
git status
git commit -m "merge changes from GitHub"
git push origin main
git log
cat mars.txt
nano mars.txt
git add mars.txt
git commit -m "new moon research"
git push origin main
git pull origin main
cat mars.txt
git log
```


