Code Initiative Workshop April 2022 - Day 1

# Code Initiative Workshop April 2022 - Day 1 ## Instructors and Helpers: - *Heather Andrews*, Aerospace Faculty Data Steward - *Bianca Giovanardi*, Aerospace Structure and Computational Mechanics Assistant Professor - *Javier Gutierrez*, Aerospace Structure and Computational Mechanics PhD candidate - *Sai Kubair Kota*, Aerospace Structure and Computational Mechanics PhD candidate - *Giorgio Tosti*, Aerospace Structure and Computational Mechanics PhD candidate ## Program Day 1 | Time | Activity | | ------------- | ----------------------------------------------- | | 09.30 - 09.45 | Introduction to the workshop (Heather & Bianca) | | 09.45 - 10.00 | Introduction to setup (Giorgio) | | 10.00 - 10.30 | Setting things up (breakout rooms) | | 10.30 - 11.00 | Navigate (Heather) | | 11.00 - 11.20 | Break | | 11.20 - 11.45 | Piping, Scripting & Finding (Heather) | | 11.45 - 12.00 | Exercise (breakout rooms) | | 12.00 - 12.25 | Introduction to Git and Day 2 (Bianca) | | 12.25 - 12.30 | Feedback (mentimeter) | ## Breaking the ice! Add +1 next to your response: ### Which Operating System are you using today? * Windows: 10 * Linux: 7 * MacOS: 5 ### From which faculty are you? * ABE 1 * AE 7 * AS 1 * CEG 4 * EEMCS 4 * IDE * TPM 2 * 3mE 4 * Other 2 ### Which of these represent you the most? * <img src="https://miro.medium.com/max/1000/0*Ua695vjzFHV6VNOX.png" width="200"/> 7 * <img src="https://miro.medium.com/max/1400/0*zGUf4rK4c4zsLq0U" width="200"/> 5 * <img src="https://i.redd.it/54ss55ix0vwy.jpg" width="200"> 7 * <img src="https://img.devrant.com/devrant/rant/r_1514074_Xa8Jo.jpg" width="200"> 4 ## Q&A ### Add your questions here! 1)Whats good on vscode vs eg pycharm/clion? Hopefully you will find out in the workshop. Thanks :) But, as a first, vscode is free (and open source) :) 2)What's the difference between bash, zsh, etc.? They are different shells. Bash is one of the most popular ones. But at the end they are all sort of a language to speak to your device/system by using the command line. 3)shall we talk more about bash rather than python? Or do we have a plan about bash in later courses? We're gonna cover it today ;) 4) What is the difference between ">" and "|"? the ">" command sends the output of the previous command into a file/channel/destination. For instance: ``` $ ls > myfile.txt``` This command would write a txt file with all the content of your present working directory. "|" command allows you to use the output of one command and send it to another command. Example: ``` $ ls | grep .txt ``` this would list only the .txt files in your present working directory. Thanks! Welcome :-) 5) Not a question, but a useful package to use Latex (e.g. within VSCode): - download and install **texlive** (in case you havent yet) `pip install texlive-full` - download and install **latexmk** `pip install latexmk` - add **texlive** path to the PATH. For example: to add something to the variable PATH you modify the `.bashrc` file which should be in your home directory, and add the following line in the file: `export PATH=path_to_texlive:$PATH` (replacing `path_to_texlive` by the actual path to where textlive has been installed. You can check this with `where texlive` command or `which textlive` command depending on your operating system) _________________ count_columns.sh ``` echo "Enter list of files to count number of columns:" read list_files echo $list_files for filename in $list_files do head -1 $filename | tr ',' ' ' | wc -w done ``` _________________ MENTIMETER Go to www.menti.com and use the code 6929 6569 Please let us know what you think! :-) _________________ # Material Day 1 ## Navigate We will start by downloading a dataset and use Bash commands to explore it and have an overview of what the dataset contains. Download dataset from 4TU.ResearchData archive: https://doi.org/10.4121/uuid:ad6de66a-72be-4835-b97d-7d6ab1bd378c This is a finalized dataset published by Nikolaos Eleftheroglou from ASM department. As you can see everyone can see, access and download this dataset. The dataset has a persistent identifier (a DOI) which ensures that 5 or 10 years from now you will not find the "404 Not found" page when trying to access this dataset. Thus it makes if *findable* and *accessible* in the long-term. We can see that the dataset has been published via a CC0 license, which essentially states "use this dataset in whichever way you want to". Another license you can use is a CC-BY which essentially states "use this dataset in whichever way you want to, but cite me on it". ### Exploring the dataset with Bash commands Let us start exploring the dataset using bash commands via the terminal: ``` pwd cd Downloads mkdir ~/Desktop/CodeInitiative_Apr2022 mv data.zip ~/Desktop/CodeInitiative_Apr2022 cd ~/Desktop/CodeInitiative_Apr2022 mkdir Day_1 mkdir Day_1/data ``` We will unzip the data.zip file in the data subdirectory: ``` unzip -h unzip data.zip -d Day_1/data/ ``` ______________________________________________ ### Parenthesis You can also untar files or gunzip files using: `tar -xf <filename>` or `gunzip <filename>`. ______________________________________________ Let us explore the unzipped files in the data subdirectory: ``` cd Day_1/ ls cd data/ ls ``` We see several files that follow a given convention: `AE*.csv` and `DIC*.csv`. We also see there is a README file called `_Readme First.txt`. Recommendation: use `README.txt` preferably. Let us explore the README file using `cat`: ``` cat '_Readme First.txt' ``` The data files correspond to monitoring data of fatigue experiments. AE stands for "acoustic emission". DIC stands for "digital image correlation". We can create a file listing all the AE and DIC files we have in here. ``` ls AE*.csv > list_AE_files.txt cat list_AE_files.txt ls DIC*.csv > list_DIC_files.txt cat list_DIC_files.txt ``` Let us see how many lines are in the lists we have just created: ``` wc list*.txt ``` We see there are 12 AE files and 12 DIC files. `wc` command shows number of lines, words, characters. If we only want to see the lines we can do: `wc -l list*.txt` ## Piping When we have data files, we first explore them in general terms. See what format they are in; how many files are there? how many rows each file has? ``` head -4 AE_Specimen01.csv tail -4 AE_Specimen01.csv ``` ``` head -4 AE_Specimen05.csv tail -4 AE_Specimen05.csv ``` ``` head -1 AE*.csv ``` We see the files do not have headers and they seem to have different number of rows, but same number of columns. We can check the number of lines the files have and sort that output numerically by piping a `sort` command to the `wc` command: ``` wc -l AE*.csv | sort -n ``` We see `AE_Specimen01.csv` has 30 lines, while `AE_Specimen02.csv` has 261 lines. ``` tail -1 AE_Specimen01.csv tail -1 AE_Specimen02.csv ``` We can now check the DIC files: ``` wc -l DIC*.csv | sort -n ``` ## Scripting We did `head -1 AE*.csv` to show how many columns each file has. This prints the first line to the terminal. But what if we had many more files? Printing them all to the screen would not be very efficient. Thus we can think of other ways of visualizing this. And we could do this by creating a Bash script that given a list of files, the script can: - read the first line of a file - replace the comma by a space - use `wc` command to count the words How can we do this? ``` head -1 AE_Specimen01.csv ``` This gets the first line of the file. Now to replace the ',' by an empty space we can use the `tr` command. The `tr` stands for translate. This command dows a lot of things such as transforming characters from uppercase to lowercase, deleting specific characters, basic find and replace, etc. ``` echo "hello" | tr 'l' '&' ``` Thus we can replace the ',' via a ' ' using the following pipe: ``` head -1 AE_Specimen01.csv | tr ',' ' ' ``` We see we have essentially two strings, two words. So we can do a pipe with `wc` command: ``` head -1 AE_Specimen01.csv | tr ',' ' ' | wc -w ``` So far we have only done piping. Let us do a Bash script called `count_columns.sh` to perform this instruction in a list of files. ``` touch count_columns.sh ``` We could do this using `echo` commands, but to explain the syntax we will better use `VSCode` editor. We can open an file using the `code` command: ``` code count_columns.sh ``` In the file we will create a for loop to perform the head-tr-wc operation in any list of csv files: ``` echo "Enter list of files to count number of columns:" read list_files echo $list_files ``` We can run the bash script on the AE*.csv files by doing (in our bash terminal): ``` bash count_columns.sh ``` With `read` we collect input. The list of files will be stored in the `list_files` variable. To use this variable in the script we use the `$` sign. Now we can add a for loop that goes through each file of the `$list_files` variable and executes the head-tr-wc command. Adding the for loop to the `count_columns.sh` file: ``` echo "Enter list of files to count number of columns:" read list_files for filename in $list_files do head -1 $filename | tr ',' ' ' | wc -w done ``` We can run the bash script on the AE*.csv files by doing: ``` bash count_columns.sh ``` Notice how when defining the variable we do not use the `$`, but when using the variable we do use it. We may want to make the output of the script a bit more clear by adding the following: ``` echo "Enter list of files to count number of columns:" read list_files for filename in $list_files do num_cols=$(head -1 $filename | tr ',' ' ' | wc -w) echo $filename $num_cols done ``` Notice how we are defining variables, quite similar to how we saw to add more directories to our **PATH**. We can run the bash script on the AE*.csv files by doing: ``` bash count_columns.sh ``` We see all the AE*.csv files have 2 columns. When running the script in the DIC*.csv files we see they also have 2 columns. Now we have a bash script that we might want to reuse on any other csv file we have. We can save this script in a directory where we have other scripts and add that directory to the **PATH**. Then we can reuse the script anywhere in our device by just doing `bash count_columns.sh`. **Note for Mac Users**: the most recent MacOS systems work natively with zsh instead of bash. Some of the commands in these tutorials have slightly different syntaxes in the two shells. Therefore, the suggestion is to follow the tutorials using **bash**. However, it is more natural for Mac users to modify the PATH variable in the zsh configuration file (.zshrc). In order not to have errors when calling the scripts in this tutorial with zsh, you can write the shabang ``` #!\bin\bash ``` at the beginning of every script. This is used to tell the OS which interpreter to use to execute the code written after the shabang. __________________________________ ## Exercise Create an `organize_dir.sh` Bash script to organize a bit better the `Day_1/` directory. The script should be run doing `bash organize_dir.sh` at `Day_1` level. The script should reorganize the current structure into the following final structure: Day_1/code/ Day_1/data/ Day_1/data/raw/ : **raw_AE*.csv** and **raw_DIC*.csv** and readme file Day_1/data/extra/ : all txt files we have generated so far Day_1/data/input/ : **AE*.csv** and **DIC*.csv** Day_1/data/output/ In `Day_1/data/raw/` all the **AE*.csv** and **DIC*.csv** files should be copied with names **raw_AE*.csv** and **raw_DIC*.csv**. Hint: use a for loop for this. ________________________________ ## Solution In `Day_1`: ``` touch organize_dir.sh ``` In `organize_dir.sh` (open the file in VSCode by typing `code organize_dir.sh` in the bash terminal): ``` mkdir code data/raw data/extra data/input data/output cd data for filename in *.csv do cp $filename raw/raw_$filename done cd .. mv data/_R* data/raw mv data/*.csv data/input mv data/*.txt data/extra ``` Then run the script as `sh organize_dir.sh` or `bash organize_dir.sh`. Did you find other ways or creating the same directory structure? When scripting, there are many roads to get to Rome! ## Find We can check the structure and everything `Day_1` contains by doing (start at `Day_1` level): ``` find . ``` `find` outputs the names of every file and directory under the current working directory. You can use: ``` find . -type d ``` to list only directories. Replacing the `d` by the `f` will show only files. If you would like a list of only certain type of files, txt files for example, then you can type: ``` find . -name *.txt ``` With `find` we can also pipe to other commands to check for example the number of lines of all AE*.csv files from the parent directory: ``` wc -l $(find data/input -name AE*.csv) | sort -n ``` _______________________________ ## Grep Another very useful command when exploring files is the `grep` command. Grep is a contraction of "global/regular expression/print", which refers to a common sequence of operations in early Unix text editors. What `grep` does is to search for a pattern in files in a case-sensitive way. To search any line that contains a word in filename you use the `grep word filename` syntax. For example: ``` pwd grep author data/raw/_Readme\ First.txt ``` To perform a case-insensitive search for the word in a given file: `grep -i word filename`. _______________________________ ## history We can also use **grep** to go through the history of commands we have made in the current Shell session. For this we would use grep in combination with another very useful command: **history**. The `history` command is used to view the commands which have been previously executed. Let us say you want to record the last commands you have made: ``` history | tail -5 > last_commands.txt cat last_commands.txt ``` When working with git you can also use the `history` command to go through all the pulling you have made by doing for example: `history | grep 'git pull'` There is also a default file known as `~/.bash_history` where Bash stores commands that have been previously executed. Whenever you open a Bash Shell, the Shell will read in the content of your `~/.bash_history` file and append that to its session so-called *history list*. The *history list* is what you see with the `history` command. As you type commands in the terminal, Bash will append those commands to the *history list*. And when you close your terminal, Bash will save the *history list* to the disk. Then the *history list* will be appended in the `~/.bash_history` file. ## Summary of Bash commands ____ `pwd`: print working directory. The *current* directory (where you currently are) will be printed in the terminal. ____ `cd` : `cd` stands for change directory. This command written alone takes you to your **home directory**. This is analogue to typing `cd ~` `cd path_dir` : (replace `path_dir` by the path of any directory within the device) this instruction will take you from your *current directory* (the directory where you are) to the `path_dir` directory. `cd ..` : when typing `..` bash understands “the directory containing this one”, or in other words, the **parent directory** of the **current directory**. - Special comment: when typing `.` (only one dot) bash understands “the current directory”. Thus, typing `cd .` will not "move" you from your current directory. `cd -` : when typing `-` bash understands "the previous directory you were in", which is faster than having to remember the full **absolute path** to it. - For example: let's say *you are* in `/c/Users/your_user_name/dir1`. Then you go directly to another directory called `/dir2` by doing `cd /c/Users/your_user_name/dir2` (that is, by using the **absolute path** to `dir2`). Then doing `cd ..` will take you to `/c/Users/your_user_name`, while doing `cd -` will take you to `/c/Users/your_user_name/dir1`. ____ `ls` : this command (alone) lists the (non-hidden) contents of the *current* directory. `ls -a` : lists all contents (including hidden) contents of the *current* directory. Hidden files usually start with a dot (`.`). Hidden files are not seen by default when using the graphical user interfaces like **File Explorer** in Windows for example. `ls -l` : (with a lower case “L”) lists all the (non-hidden) contents of the *current* directory, plus their size, last modified date and time, owner of file and respective permissions. `ls -lS` : (with a lower case “L” and an upper case "S") lists the same as `ls -l` but with all files and directories sorted by size. `ls -lh` : (with a lower case “L” and a lower case "H") lists the same as `ls -l` but with the size in human readable format (KB, MB, etc.). `ls -R` : lists all the (non-hidden) contents of the *current* directory, and the content of the sub-directories within that current directory. See more on `ls` command by typing `ls --help`. If you are on macOS, use `man ls` instead (`man` as in "manual"). ____ `mkdir name_dir` : make (create) directory called `name_dir`. Replace `name_dir` by the name of the directory to be created within the **current directory**, or it can be the **absolute path** with the name of the new directory. - For example: let's say *you are* in `/c/Users/your_user_name/dir1`. Then you can create a sub-directory called `sub_dir1` within `dir1` by doing `mkdir sub_dir1`. - Another example: let's say *you are* in `/c/Users/your_user_name/dir1`. Then you can create a directory `dir2` in the **parent directory** of `dir1` by doing `mkdir ../dir2`. Do `pwd` to see you have not actually *moved*! Do `cd ..` and then `ls` to see the `dir2` has been created. - And another example based on the previous one: instead of using `../` you can specify the **absolute path** to create `dir2` by doing `mkdir /c/Users/your_user_name/dir2`. `mkdir –p name_dir/sub_name_dir/subsub_name_dir` : the `-p` flag allows `mkdir` to create a directory with any number of **nested subdirectories** in a **single operation** (a single line of commands). Then from the **current directory**, you can create a structure of directories inside it! ____ `touch my_file` : creates an empty file in the *current* directory (replace `my_file` with the name of the file you want to create). - For example: you want to create a `README.txt` or `script.py` file, then you type `touch README.txt` or `touch script.py` respectively. `echo string > my_file.txt`: add the `string` in `my_file.txt`. - For example: - when doing `echo "First line of file" > my_file.txt`, the file `my_file.txt` will now contain "First line of file" in it. - When using `>>` instead of `>`, the string will be positioned after the last line of the file. `cat my_file.txt` : prints the contents of the `my_file.txt` file to the screen of the terminal (replace `my_file.txt` with the name of the file you want to see). Then you can immediately see its contents in the screen of the terminal, instead of opening the file with a separate application that may take longer to open. Particularly useful for quick check ups of files. ____ `mv file1 dir1` : when first argument is a *file* and second argument is a *directory*, `mv` moves the *file* to that *directory*. You can provide the name of the file (if the file is in the *current* directory) or the path of the file (if the file is in another folder. - For example: let's say *you are* in `/c/Users/your_user_name/dir1` and you want to move a PDF file called `file1.pdf` (that is in the *current* directory) to the `/c/Users/your_user_name/dir2` directory. Then you can type: `mv file1.pdf /c/Users/your_user_name/dir2`, or more effectively: `mv file1.pdf ../dir2`. - Another example: let's say *you are* in `/c/Users/your_user_name/` and you want to move `file1.pdf` (which is in `/c/Users/your_user_name/dir1`) to `/c/Users/your_user_name/dir2`. Then you can type: `mv dir1/file1.pdf dir2` `mv file1 file2` : when both arguments are *files*, `mv` renames the first file to the second file (here you should replace `file1` by the name of the file you want to rename, and `file2` by the new name you want to give it). Both `file1` and `file2` can also be given as paths. For example: `mv /c/Users/your_user_name/file1 /c/Users/your_user_name/file2`. - Keep in mind `mv` will **silently overwrite any existing file**. A recommended option (especially when starting with Bash commands) is to use the **interactive** flag: `mv -i` (or `mv --interactive`). This will ask you for confirmation before overwriting. ____ `cp file1 file2` : copies the file given as a first argument to the file given as the second argument (here you should replace `file1` by the name of the file you want to make a copy of, and `file2` by name of the file you want to be the copy of `file1`). Both files can also be given as paths to the respective files. For example: `cp /c/Users/your_user_name/file1 /c/Users/your_user_name/file2`. - For copying a directory and all its contents you can use the recursive flag `-r`, e.g. to back up a directory. For example: `cp –r thesis thesis_backup`. ____ `history`: shows you the commands you have typed during the current Shell session (i.e. while having that bash terminal active). `grep`: allows you to search for patterns in files in a case-sensitive way. For example: use `grep word filename` to search for any line that contains `word` in a given file called `filename`. ____ The following `rm` command uses are highlighted in red because you have to be careful when deleting files and directories: `rm file1` : deletes a file (replace `file1` by the file you want to delete). **Be aware this command deletes the file FOR-E-VER. So be careful!** `rm -i file1` : using the interactive flag -`i`, this command interactively asks if you *really* want to delete such file. Reply with `y` or `n` if you *want* or you *do not want* to delete it respectively. `rm –r dir1` : deletes a directory (replace `dir1` by the directory you want to delete). As above: **be careful!** `rm –ri dir1` : by adding the `i` flag the command will interactively ask you if you really want to delete such directory. Reply with `y` or `n` if you *want* or you *do not want* to delete it respectively.