Code Initiative Workshop April 2022 - Day 2

# Code Initiative Workshop April 2022 - Day 2 ## Instructors and Helpers: - *Heather Andrews*, Aerospace Faculty Data Steward - *Bianca Giovanardi*, Aerospace Structure and Computational Mechanics Assistant Professor - *Javier Gutierrez*, Aerospace Structure and Computational Mechanics PhD candidate - *Sai Kubair Kota*, Aerospace Structure and Computational Mechanics PhD candidate - *Giorgio Tosti*, Aerospace Structure and Computational Mechanics PhD candidate ## Program | Time | Activity | | ------------- | ------------------------------------------------ | | 09.30 - 09.40 | Introduction to the session (Heather and Bianca) | | 09.40 - 10.05 | Cookiecutter | | 10.05 - 10.25 | Exercise (breakout room) | | 10.25 - 10.40 | Break | | 10.40 - 11.00 | Project Structure | | 11.00 - 11.25 | Version control with Git | | 11.25 - 12.15 | Makefile | | 12.15 - 12.25 | Heads up for Day 3 | | 12.25 - 12.30 | Feedback (Mentimeter) | <img src="https://miro.medium.com/max/1280/0*HhzqQ5ACowM4J4j9.jpg" width="400" height="300"> ## Breaking the ice! Add +1 next to your response: ### Which Operating System are you using today? * Windows 10 * Linux 6 * MacOS 3 ### From which faculty are you? * ABE * AE 8 * AS 1 * CEG 3 * EEMCS 3 * IDE * TPM 1 * 3mE 2 * Other 1 ### Which of these situations do you relate the most with? - I did only harmless changes to my code but now nothing works any more and I have no idea why! 2 - My computer is full of files like: $\texttt{file_v1.py}$, $\texttt{file_v2_final.py}$, ..., $\texttt{file_v100_final_FINAL.py}$ 6 - I am pretty sure this line of code is correct as I was convinced of that when I wrote it, but I don't recall now what I did have in mind... 8 ## Notes Day 2 ### To activate the Environment created in Day 1 For Windows users: `source ~/coding_initiative_env/Scripts/activate` Assuming you created the environment in your home directory (e.g., `/c/Users/your_user_name`) ### cookiecutter.json ``` { "project_name": "My Project", "repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}", "package_name": "my_package", "source_filename": "my_file", "full_name": "Your name", "short_description": "A cool project" } ``` ### Exercise Move all the contents from previous training session (e.g., ~/Desktop/CodeInitiative_Apr2022/Day_1) into the cookiecutter directory structure (e.g., ~/Desktop/CodeInitiative_Apr2022/Day_2/project_fatigue) using the (Bash) command-line. Time: 15 minutes. ### cookiecutter repo https://github.com/debrevitatevitae/cookiecutter-tud-ascm.git ### Parenthesis: Link to Hackmd of Day 1 (see Material Day 1 section in it): https://hackmd.io/iyPN89T1SuyTw1FNPuBaqg ### How `project_fatigue` should look like after the exercise ``` . ├── LICENSE ├── Makefile ├── README.md ├── TODO.md ├── data │ ├── extra │ │ ├── list_AE_file.txt │ │ └── list_DIC_file.txt │ ├── input │ │ ├── AE_Specimen01.csv │ │ ├── AE_Specimen02.csv │ │ ├── AE_Specimen03.csv │ │ ├── AE_Specimen04.csv │ │ ├── AE_Specimen05.csv │ │ ├── AE_Specimen06.csv │ │ ├── AE_Specimen07.csv │ │ ├── AE_Specimen08.csv │ │ ├── AE_Specimen09.csv │ │ ├── AE_Specimen10.csv │ │ ├── AE_Specimen11.csv │ │ ├── AE_Specimen12.csv │ │ ├── DIC_Specimen01.csv │ │ ├── DIC_Specimen02.csv │ │ ├── DIC_Specimen03.csv │ │ ├── DIC_Specimen04.csv │ │ ├── DIC_Specimen05.csv │ │ ├── DIC_Specimen06.csv │ │ ├── DIC_Specimen07.csv │ │ ├── DIC_Specimen08.csv │ │ ├── DIC_Specimen09.csv │ │ ├── DIC_Specimen10.csv │ │ ├── DIC_Specimen11.csv │ │ └── DIC_Specimen12.csv │ ├── output │ └── raw ├── docs │ └── _Readme First.txt ├── notebooks │ └── 1-project_fatigue-notebook.ipynb ├── project_fatigue │ ├── README.md │ ├── __init__.py │ ├── count_columns.sh │ ├── project_fatigue.py │ └── visualization │ ├── README.md │ ├── __init__.py │ └── visualize.py ├── requirements.txt ├── setup.cfg ├── setup.py └── tests └── __init__.py ``` ## TU Delft Research Software Policy https://doi.org/10.5281/zenodo.4629662 ------------------- ## Other useful Cookiecutter templates http://cookiecutter-templates.sebastianruml.name/ ## Questions - **Just out of curiosity, how did you create the list on the right showing the sequence of commands? Do you use something like `tail -f ~/.bash_history`? I tried it but doesn't seem to work for me** Yep! To set up command history on two terminals do the following (the activating the environment is necessary for this workshop; but not for the setting up of two terminals): On main terminal activate the coding_initiative_env and then type: `export PROMPT_COMMAND="history -a; $PROMPT_COMMAND"` *That is appending the commands as their typed in the terminal.* Open a second terminal and activate the coding_initiative_env and then type: `tail -f ~/.bash_history` *This one is showing the last command typed in the main terminal.* ---------------- - **Whats most usefull license for standard research project that you share on github?** when you create a project on github (and I think gitlab as well), they automatically suggest the most popular open source licenses (MIT, Apache, GPL, BSD3) . They have some slight differences between them, but the main takeaway is that the code is freely reusable and new code can be developed on top of it, also for commercial purposes, provided that the original author is given credit. Here is a website where you can also check the difference between different (and the most popular) open-source licenses: https://choosealicense.com/ ------------------- - **When I type `find .`, the result is not sorted alphabetically. Any idea why that is and how I can change it?** You can try a pipeline: ``` find . | sort ``` for help about the sort command, check ```sort --help``` In any case, the ```find``` command is very powerful and can sort, copy... by itself. check it out! See Material Day 1 section of the hackmd of Day 1: https://hackmd.io/iyPN89T1SuyTw1FNPuBaqg ## Mentimeter Please go to www.menti.com and use the code 8596 7575 and provide us feedback on today's session. # Material Day 2 ## Cookiecutter In this tutorial we will do two things. First we will create a small template together. This will show us the syntax that cookiecutter identifies and how the `cookiecutter.json` file should look like. In the second part we will cookiecut from a remote repository on Github. This contains a template that will be used for creating the skeleton of the project developed in the rest of the tutorial. You will also be able to use it for your future project, if you'll want. ## Creating a simple Cookiecutter template A cookiecutter template is a directory with two fundamental ingredients: 1. A `cookiecutter.json` file 2. A subdirectory named with the following format `{{cookiecutter.<your_repository_name>}}` where `<your_repository_name>` should be substituted with the name of the local repository. First things first, let's create a directory for our template. For instance, let's do ``` cd ~/Desktop/CodeInitiative_Apr2022 mkdir -p Day2/CodeInitative_Template cd CodeInitiative_Template ``` ## `cookiecutter.json` Now let's create an empty `cookiecutter.json` ``` touch cookiecutter.json ``` Now, we want to populate our `cookiecutter.json` file. First of all a JSON file has the following format ``` { "var1": value1, "var2": value2, # and so on ... } ``` The `cookiecutter.json` file includes all those variables whose values can change from one project to another. This can be, for instance, the project name, the source code directory name, the author name, etc. When called for creating a project, cookiecutter goes through `cookiecutter.json`. For every 'variable-value' pair, it prompts the user for a value to give to that variable. The values that we see in `cookiecutter.json` are defaults that get assigned to the corresponding variable if the user simply hits `Enter`. So let's start filling in some entries, beginning with the name of the project ``` "project_name": "My Project" ``` Remember, the value (right hand side) is just a default, not necessarily the final name of the project. Now, we can use a little bit of `cookiecutter` syntax, such that the name of the repository will be consistent with the name of the project ``` "project_name": "My Project", "repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}" ``` This line we just added will make sure that the default repository name is the same as project's name, but all de-capitalized and with spaces substituted by underscores. For the rest of this demonstaration, let's assume that we want to create a Python project. As we know from our practice, whenever we want to import a Python library or *package*, we do `import <package_name>`. Now, a package name is exactly the next thing we need to define > **_NOTE:_** In the future, by setting a file called `setup.py` (explained later) and by doing `pip install -e .` in your project directory, you will be able to `import` your package name from eveywhere in your machine! ``` "project_name": "My Project", "repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}", "package_name": "my_package" ``` Now that we've got the hang of it, let's add another couple of lines. In particular, it is useful to add a name of a first Python script to go in our package, the name of the author of the project and, finally, a short (2 lines max) description of the project. Let's do this as so: ``` { "project_name": "My project", "repo_name": "{{ cookiecutter.project_name.lower().replace(' ', '_') }}", "package_name": "my_package", "source_filename": "my_file", "full_name": "Your name", "short_description": "A cool project" } ``` ## Project directory Great! Now, the second ingredient missing is the (sub)directory that will represent our project. We can create this with: ``` mkdir {{cookiecutter.repo_name}} ``` The double curly braces look exotic. These are called *templating tags*. What's special about them is that they recognize *namespaces*. This means exactly what we wanted. In fact, if we were just to write `{{repo_name}}`, the name of our repository would always be `repo_name`. However, the syntax `{{cookiecutter.repo_name}}` tells cookiecutter to look into `cookiecutter.json` for the `repo_name` variable. So the `cookiecutter.json` really is the mold of our project! ## Populate the project directory Now let's `cd` into our template-project directory: ``` cd {{cookiecutter.repo_name}} ``` and let's include some of the omnipresent elements in a project. For now, let's only do empty files and directories and then we will populate those too: ``` mkdir data tests touch .gitignore LICENSE Makefile README.md ``` We still miss the directory of our python package. For this, we will need to use curly braces again: ``` mkdir {{cookiecutter.package_name}} ``` > **_NOTE:_** This is just a reduced version of a project template. Other necessary elements are `docs/`, setup files, Docker files, etc. An easy one is to `cd` into our `data/` directory and give it the same structure that we gave it in the last session. So: ``` cd data/ mkdir raw input output extra ``` Let's be happy with this for `data` and move to `{{cookiecutter.package_name}}`. Here we want to creat two things. A first source module and a `__init__.py` file. ``` cd \{\{cookiecutter.package_name\}\} touch __init__.py {{cookiecutter.source_filename}}.py ``` > **_NOTE:_** We will see the `__init__.py` better later. As a preview, this is generally an empty file that signals the Python interpreter that it is allowed to import from this directory. Another place where `source_filename` can be useful is in `tests`. As it will be later explained, it's usually a good norm in Python to call the test of a certain module as `test_<module_name>`. Let's do it using the cookiecutter namespace: ``` cd tests/ touch test_{{cookiecutter.source_filename}}.py ``` We still need to take care of the files that we've created. Let's be standard about LICENSE and Makefile for now. We will leave LICENSE empty and add some basic functionality to our Makefile: ``` INPUT=data/input OUTPUT=data/output LANGUAGE=python .PHONY: clean clean: @rm -rf $(OUTPUT)/* ``` Finally, we want to write something in our README. Here is another good place where we can use the cookiecutter namespace. ``` # {{cookiecutter.project_name}} **Author**: {{cookiecutter.full_name}} Welcome to {{cookiecutter.project_name}}! {{cookiecutter.short_description}} ``` In this way, the project name, author name and description that we give as input will all be included in the README. ## Using the template for a project Alright! Now we have a fully working cookiecutter template. Let's use this template to create a project and let's explore that project to make sure that everything worked. Let's create this project in our Desktop: ``` cd ~/Desktop cookiecutter ~/Desktop/CodeInitiative_Apr2022/CodeInitiative_Template ``` Now cookiecutter prompts us for the input that it is specified in `cookiecutter.json`. Let's fill it like so: ``` project_name [My project]: Awesome Project repo_name [awesome_project]: package_name [my_package]: awsm source_filename [my_file]: tools full_name [Your Name]: short_description [A cool project]: lots of awesomeness ``` If we `ls -l` in our Desktop direcotory, we will see that our `awesome_project` project has just been created! Now we can move around inside the projec's directory and check that everything worked as expected. ## Let's cookiecut from a remote repository We've just seen how to create your own template for your future repositories. This is very useful, as everybody has his/her own needs in a template. Moreover, if we'll want to access our template from multiple machines and/or share it with thers, it's also a good idea to publish it on Github or other remote repositories providers. We will see later in the tutorial how to publish a project on Github. Another way to go is to use someone else's template. There are many cookiecutter templates available online, some of which are really popular and well maintained. The advantage could be that these repos may have files/directories that we didn't think to include, but are necessary or useful. Sometimes, there may be things whose meaning we ignore, but that we can search about and possibly find out they're useful. Some other times where will be elements that we don't require and then we can always prune our project after it has been created. For the last part of our demonstration, we will create a project from an external template on Github. This will be used throughout the rest of the tutorial. Let's take a look at the Github page https://github.com/debrevitatevitae/cookiecutter-tud-ascm. Now let's use this template: ``` cookiecutter https://github.com/debrevitatevitae/cookiecutter-tud-ascm.git ``` and let's enter the following: ``` project_name [The name of your project]: Project Fatigue repo_name [project_fatigue]: full_name [Your full name (or team's name)]: <Your Name> version [Version of your project]: 0.1.0 short_description [An awesome project]: Library to postprocess the results of fatigue analyses. ``` ## A deeper look into the Project Directory Structure Before we go into modifying files and do further version control on them, we will go through the structure that was created with Cookiecutter. ### .gitignore By doing `ls -a` we see Cookiecutter created the `.gitignore` file. Based on what we have learned from the Wiki, we know this is the file where we specify what we want Git to ignore when keeping track of files. ``` cat .gitignore ``` We see the file is quite long, so I will open it with Nano (you can open it with an editor if you want to): ``` nano .gitignore ``` We see a lot of things in there that mainly refer to files and directories related to the actual building of **python packages**. Let us make it more applicable to our project and remove some of the files and directories there. The .gitignore will now look like this: ``` ### Python ### # Byte-compiled / optimized / DLL files __pycache__/ # Unit test / coverage reports .pytest_cache/ pytestdebug.log # Jupyter Notebook .ipynb_checkpoints # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ # operating system-related files # file properties cache/storage on macOS *.DS_Store # thumbnail cache on Windows Thumbs.db # Data directory data/ # TODO file TODO.md ``` Now that we have modified the `.gitignore`, let us add it and commit it: ``` git add .gitignore git commit -m "Modified .gitignore file" git push origin master ``` ### LICENSE As mentioned in the Wiki, it is always good to create a LICENSE file from the start of the project because it will remind us that once we publish or want to share the code with others, we should specify a license for it. A license is a document that helps us tell others how to reuse in this case the code (responsible code sharing). ``` cat LICENSE ``` Which license? It depends on how you would like to share the code with others. In fact we now have a [TU Delft Research Software policy](http://doi.org/10.5281/zenodo.4629662). There (in page 6) you can see some of the pre-approved open source code licenses. Those licenses essentially say *"use at your own risk, but put the copyright notice for whichever chunk of code you reuse from this code"*. One comment there, please **do not use a CC0 license** for code. The CC licenses (Creative Commons) are more for "data" (tabular data, images, recordings, reports, etc.). ### Makefile **Make** was developed by Stuart Feldman in 1977! And it is still quite used nowadays! Just like a Shell script, **Make** allows us to execute several commands by using a single `make` instruction. The difference with a Shell script is that **Make** explicitly records the dependencies between files (what files are needed to create other files). This is what allows **Make** to know what to rerun and what not to rerun. **Make** can be used for any commands that follow the general pattern of processing files to create new files. For example: - run analysis scripts on raw data files to get data files that summarize the raw data. - Run visualization scripts on data files to produce plots. - Compile source code into executable programs or libraries. ___________________________________ ### Parenthesis There are now many build tools available that work similarly to **Make** (e.g., **nmake**, **CMake**). Which to use depends on your requirements, usage, and operating system. However, they all share the same fundamental concepts as Make. __________________________________ In order to use **Make** you need to create a file called **Makefile** in the parent directory. **Makefile** has the following structure: ``` # comment explaining the rule target: dependencies action ``` - The `#` refers to a **comment** that will be ignored by **make**. - The **target** is the file to be created or built. Notice you have to provide the correct path to each file. - The **dependency** is a file(s) that is needed to build or update the **target**. - The **action** is the command(s) to run to build or update the target using the dependencies. All in all, the **target**, **dependencies** and the **actions** form a what is called a **rule**. The **rule** tells how to build or update a **target** using the **dependencies** on a given **action**. When it is asked to build a **target**, **Make** checks the last modification time of both the **target** and its **dependencies**. If any **dependency** has been updated after the **target**, then the **actions** are re-run. ### README.md This README Markdown file has already some content in it. That is because Cookiecutter placed some basic content based on the inputs you gave to the items established in the json file. ``` cat README.md ``` As you have probably noticed this **top-level** README Markdown file is the one that shows up in the Gitlab repository. This README is the **front-end** of your project. It is the first thing that your future-self and others will see and read in order to understand what the project is about, and what the repository contains. Thus it is extremely important that you provide useful and complete information in this file. Having said that, in every project there should be a top-level README file in a simple text format (in this case a .md). This README file should be continuously updated throughout the project with: - what the code does - how the code/repository is structured - what is needed for the code to run (dependencies and versions of libraries with which the code has been developed) - instructions on how to compile and run the code - references and acknowledgements The README in a repository should be understandable not only by you (and your future self!), but also by others. **Recommendation:** *do not write the README at the very last minute. But keep it alive during the development of the project. And by the end of the project ask a colleague or your faculty Data Steward for feedback on it.* ### TODO.md This is a Markdown file where you can list your to-do tasks. It is a temporary file. ``` cat TODO.md ``` ### Requirements The `requirements.txt` is a file where you state what is needed to run the codes of the project (libraries, versions, etc). Providing this information is crucial for the re-use of the code and its sustainability. In our case, the `requirements.txt` has some text in it to exemplify how we should write it: ``` cat requirements.txt ``` What can be done with this file? Others can make sure they have all necessary requirements to properly run the project by installing them with `pip`: ``` pip install -r requirements.txt ``` So far our plotting script uses for example **numpy** and **matplotlib** libraries. What are the versions of these libraries that we are using? We can check that out by using `pip`: ``` pip show numpy pip show matplotlib ``` During the development of the project, you can keep track of what your project is requiring. But the idea is that at the end of the project you really make sure the `requirements.txt` file contains everything it should. One way of doing this is to use `pip freeze` at the end of the project, to dump all the information regarding installed python pacakges in a `requirements.txt` file (`pip freeze > requirements.txt`). Nice reference for more on how to write the `requirements.txt`: https://medium.com/python-pandemonium/better-python-dependency-and-package-management-b5d8ea29dff1 ### Setup We have 2 setup files: `setup.py` and a `setup.cfg`. These are configuration files used to build Python packages using **distutils**. What is the difference between the two? In very general terms, while in the `setup.py` you establish characteristics and requirements for the python package you are building, in the `setup.cfg` you specify aliases and configuration values for those characteristics and requirements. Let us take a look at the `setup.py` script: ``` cat setup.py ``` It looks like this: ``` from setuptools import find_packages, setup with open('requirements.txt') as f: REQUIREMENTS = f.read().splitlines() setup( name='my_project', version='Version of your project: ', description='A short description of your project', author='Your full name (or team's name)', license='', packages=find_packages(include=['my_project']), install_requires=REQUIREMENTS, extras_require={ 'interactive': ['jupyter', 'matplotlib'] }, setup_requires=['pytest-runner', 'flake8'], tests_require=['pytest'] ) ``` A few comments on the parameters the **setuptools.setup()** function receives as input: - `name`: technically this is the name of the python package that is being built. This does not have to be the same name as the folder name the package lives in, although it may be confusing if it is not. An example of this situation is the **Scikit-Learn** package: you install it using `pip install scikit-learn`, while you use it by importing `from sklearn`. - `install_requires`: here it is specified to read the requirements from the `requirements.txt` we talked about before. You may specify requirements without a version (e.g., `numpy`), pin a version (`numpy==1.14.5`), specify a minimum version (`numpy>=1.14.5`) or set a range of versions (`matplotlib>=2.2.0,<3.0.0`). These requirements will automatically be installed by `pip` when the package is installed. - `extras-require`: this is to specify dependencies that might be useful to set up some times but not always. For example, interactive dependencies that for developers are useful to set up the pacakge with. But that for regular users of the package (not developers) might be not necessary to install. Having said that, using `pip install` will only install the *default* requirements (in this case specified from `requirements.txt`). If on the other hand, someone would like to also install the extra requirements, then the person should specify that by doing `pip install package_name[interactive]` or `pip install -e .[interactive]`. - `setup_requires` and `test_requires` refer to testing configuration. We will cover testing later in this session. But essentially in Python you can use `pytest` for setting up tests. Then calling this `setup.py` script will ensure all required dependencies are installed and it will run `pytest` to make sure the package is properly set up. Let us take a look now at our `setup.cfg`: ``` cat setup.cfg ``` This one looks like this: ``` [aliases] test=pytest [flake8] max-line-length=79 ``` There we have an alias defined for **pytest**. So that we can do `python setup.py test` to run `pytest` and again, make sure the package is properly set up. We also see there is the maximum line length defined for **flake8**. **Flake8** is a python library that checks the formatting of the code. So that when running `python setup.py flake8` the format of the code(s) of the project will be checked out. ### .gitkeep Looking at the empty folders we have (for example the `docs/` folder), we see there is a `.gitkeep` hidden file. A `.gitkeep` file is an empty file that Git users create, so that Git preserves an otherwise empty project directory. As mentioned in the Wiki, Git does not track the creation or removal of directories. It only tracks files. Thus, creating a hidden file inside an empty directory, allows us to *see* that directory in Git. Keep in mind the name `.gitkeep` is **not an official** Git *standard*. But it is a **convention** within the Git community (sort of the opposite of `.gitignore`, which by the way it is a Git *standard*). ### \_\_init\_\_.py In `tests/`, `my_project/` and `my_project/visualization` directories we see there is a `__init__.py`. The `__init__.py` file is usually an empty file that makes Python treat directories containing it as **modules**. Thus if we add the `~/Desktop/my_project/my_project/visualization` directory to our path, then we can import the `plot_script.py` as `import visualization.plot_script`. Or we can do `from visualization import plot_script`. ``` export PATH="~/Desktop/my_project/my_project/visualization:$PATH" echo $PATH cd ~/Desktop/my_project/my_project/ python ``` In python: ``` import visualization.plot_script as vipls help(vipls) from visualization import plot_script help(plot_script) ``` If you remove the `__init__.py` file, Python will no longer look for submodules inside that directory, so the import will fail. _____________________________________ ### Docstrings We saw our `plot_script.py` has very little documentation. One way of adding documentation that out future self and others can call when running the `help()` function is to create docstrings. We can check whether a docstring is ok with for example with [**pydocstyle**](https://pypi.org/project/pydocstyle/) tool, which checks for compliance with the **Python Enhancement Protocol 257**. _______________________________________ ``` pip install pydocstyle pydocstyle <my_python_script> ``` ______________________________________ For more extensive documentation setup you can use Sphinx which is a tool to create html documentation for the entire project. See the [Code Refinery lesson on how to use Sphinx](https://coderefinery.github.io/sphinx-lesson/). ## Version Control with Git ### How does version control work? Version control systems start with a **base version** of a file and then record all changes made to it with useful **metadata** (e.g., information about those changes). This helps you to keep track and understand all the **history of changes** made to each tracked file. A version control system not only allows you to see those changes, but it also allows you to retrieve different versions of each file (e.g., in case you need to recover a previous version of a file). In that sense, Git allows you to keep track of all changes made to a file as a-sort of "hidden information". So that instead of seeing a visible (most likely long) list of files in your working directory (e.g., `file_v1.dat`, `file_v2_final.dat` ... `file_v10_final_FINAL.dat`), you have a single hidden directory (the hidden `.git/` directory) where all that information is stored and it is easily retrievable. Aside keeping track of file changes (each record of these changes is called a **commit**), a version control system also allows you to merge the changes made by different people. In this way you can follow the history of different files in different directories, all at the same time! The complete history of commits for a particular project and their metadata make up what is referred to as a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people. When working with Git you will have a local repository (in your local device) and a remote repository (e.g. in the TU Delft Gitlab at gitlab.tudelft.nl). In the TU Delft Gitlab a **repository** is referred to as a **project**. After installing **Git** on your device (work laptop/station), you have to **configure** it accordingly. We will do this via the terminal (Windows users: remember we use the terminal of **Git Bash**). Open the terminal. Starting at your **home directory** (remember you can go there by doing `cd` in the terminal; make sure you are there by using the `pwd` command) type the following commands to set up your Git account: `git config --global user.name “X”` : where **X** is your name in TU Delft Gitlab. For example: if your name is John Smith then on the prompt of **Git Bash** type: `git config --global user.name “John Smith”` `git config --global user.email “Y”` : where **Y** is same email as used in TU Delft Gitlab, e.g. `J.Smith@tudelft.nl`. `git config --global core.editor “Z“` : where **Z** is the text editor of your preference. For example: - For **Kate** (Linux): `git config --global core.editor "kate"` - For **Gedit** (Linux, Windows): `git config --global core.editor "gedit --wait --new-window"` - For **Vim** (all): `git config --global core.editor "vim"` - For **VSCode** (all): `git config --global core.editor "code --wait"` - For **Emacs** (all): `git config --global core.editor "emacs"` Keep in mind, some editors open **within** the terminal itself (e.g., **Vim** or **nano**) while others will open **outside** the terminal as a separate application (e.g., **VSCode** or **gedit**). Recommendation: use editors that open **within** the terminal itself. Then you will get used to shorten your **commit** messages. __________________________ ### Parenthesis There are actually more files Git looks for when configuring your Git instance. The one that defines the `--global` settings refers to the file in your `~/.gitconfig` (or `~/.config/git/config` file if applicable). Type `git config --list` to see a list of all configuration settings of Git. Type `git config --help` to see the help page for Git configuration options. __________________________ ## Start working with Git When working with Git on your *local* files, you need to **initialize** a (*local*) repository (in your work laptop/station) or **clone** a *remote* repository (so that you have a "copy" of the *remote* repository in your *local* device). ### Initialize a *local* repository Go to the directory where you have the files you want Git to start tracking. Let's say you want to start working on **Project_Z** in `~/Documents/Project_Z`. If you have not created the directory `Project_Z` inside `~/Documents`, then remember you can do that with Bash using the `mkdir` command: `cd` `cd ~/Documents` `mkdir Project_Z` `cd Project_Z` Now you can **initialize** a Git repository there by typing: `git init` `git init` : this initializes a Git repository in the directory where you have the files you want Git to start tracking. Git will create there a **hidden directory** called `.git` (you can see it by doing `ls -a`). The `.git/` directory is where the history of all (tracked) files (including the ones in sub-directories) will be stored. Thus doing `ls -a` in `Project_Z` will show you the `.git` directory. Git will be able now to track all files in `Project_Z` and in all its sub-directories. By deleting the `.git` directory (by doing `rm -rf .git`) you will be deleting the respective Git repository. You will not delete the files. You will only delete the recorded history of changes of the files. Also be aware **Git does not track the creation or removal of directories**. Git **only tracks files**. It will automatically “know” *where* the files are. But it will not track the creation of empty directories. ### The Master Branch When doing `git init` in your device (work laptop/station), Git *by default* creates what is called: the **master** branch. This is the *local* **master** branch. What is a branch? A **branch** can be thought of as a **separate copy** of a repository where you can make changes, and those changes will stay in that branch, unless you merge that branch with another branch. In that sense, each branch allows you to follow a **different line of development** of the project. You can have several branches for the same project, and you can name each branch differently. For now, just remember that `git init` creates a *local* **master** branch by default. __________________________ ### Important to keep in mind: do not nest Git repos! **Do not nest Git repositories**. In other words: do not do `git init` in a sub-directory of the directory where Git has already been initialized. **This will just create conflicts!** You only initialize a repository once. For example: let's say you have the directory `~/Project_Z` where you will start working on a given project. Doing `git init` in `~/Project_Z` will initialize a repository there, and will create a *local* **master** branch there, where the changes of all files within `~/Project_Z` can be tracked. Let's say you are working in another project at the same time (not related to `Project_Z`) and you want to use Git to also track those files. Then in the directory of such project (let's say it is in `~/Documents/Project_W`) you do `git init` (only once!). This will initialize a different Git repository there. This repository will also have its own *local* **master** branch, where its own line of developments will be followed. __________________________ ### The origin As we mentioned before, you will be working *locally* on your files in your *local* Git repository (which by default will be referred to as the *local* **master** branch). You will sync this *local* repository with a *remote* **Project_Y** in for example the TU Delft Gitlab. This *remote* repository is *locally* recognized by Git as the famous **origin**. In that way, Git "knows" to which *remote* **origin** (TU Delft Gitlab repository) the *local* changes you make should be synced to. The **origin** will also have its own **master** branch by default. Thus when looking at the *remote* branches of your project via the terminal, you will see it as the *remote* **origin/master** branch. ### Summarizing the initial situation - You will have a *local* Git repository, created by typing `git init` in a working directory (in your device). This will create the *local* **master** branch by default. Thus you will be working with a single *local* branch (initially). - You will have a *remote* Git repository in the TU Delft Gitlab instance (the **Project_Y**). This *remote* repository also has a **master** branch (created by default when the project is created in Gitlab). This **master** branch is recognized by your *local* Git repository as the *remote* **origin/master**. - You will be making changes to your files *locally*. To "synchronize" the *remote* repository with all the changes you make *locally*, you will be *pushing* the changes to the *remote* repository. Likewise, if changes are made directly via the TU Delft Gitlab instance (so you make changes to the files *online*), you will be *pulling* those changes to your *local* repository, before start working *locally* on your files. This *pushing* and *pulling* workflow becomes extremely important when you work with more branches, either by yourself (you create branches to explore testing of the code for example) or in collaboration with others (each collaborator can have its own branch for example). ## Start working locally on the files After **initializing** a repository, you can start working on the files of your *local* **master** branch (which is the only branch you have so far), *pulling*/*pushing* from/to the *remote* repository. When you work on files inside the directory where Git has been initialized, you have to tell Git *"hey, keep track of this file"* and *"hey, record the changes made to this file and add this metadata to those changes"*. The first thing is calling **adding** files to Git, and the second thing is making a **commit** of changes made to a file. Let’s say you started working on `file1` (e.g., a Python `.py` script) doing all changes in your device (in your *local* Git repository at `~/Documents/Project_Y`). Use: `git add file1` : this will tell Git to basically "pay attention" to this file ("keep track of it") and put it in the so-called **staging area**. Let’s say you created a function on `file1`. Then you have to **commit** such change (so that there is a record that a function has been added to the file). To do this type: `git commit -m “text_commit1”` : this will tell Git to **record such a change** with the **descriptive metadata** `“text_commit1”`. Replace `“text_commit1”` with a short description of the change made to the file. For example: `git commit –m “Added sum function”`. If you only type `git commit`, the editor you set by default (when installing Git) will open so that you can write the **descriptive metadata** `“text_commit1”`. But always try to keep `“text_commit1”` as short as possible. See the following [blog](https://chris.beams.io/posts/git-commit/) where you can find principles on how to write Git commit messages. ## Difference between adding and committing *Adding* and *committing* might be a bit confusing at first. Think as if you were preparing the clothes you will take for a trip. You first put the clothes on your bed to have an overview of what you will be taking to the trip. In this case, the act of *putting the clothes on the bed* would be `git add`; while the *bed* itself would the so-called `staging area`. Once you have *finally decided what to take* on your trip, you *put the clothes on the suitcase*. In this case, *putting the clothes on the suitcase* would be like doing a `git commit`. While working on a file, and between **adding** and **committing**, you might also find useful the following commands: `git add --all` : to add all changes made to all files to the **staging area**. For example: - you modify `file1` and `file2`, all within the same *(local)* repository. - Then you do `git add --all` and then `git commit –m “Change X in file1 and change Y in file2”` - Then you do `git push origin master`. You will then see the changes in the TU Delft Gitlab *(remote)* repository linked to this *(local)* **master**. `git status` : this will show you the *tracked* and *untracked* files. The “untracked” file(s) message means that there is(are) file(s) in the directory that Git is **not keeping track of**. If you want Git to track them, then you should **add** the files using `git add file_name`. `git diff` : shows the changes you have done *locally* (compared to the last version that was **committed**) but you have **not added** (with `git add` to the *staging area*) **nor committed** (with `git commit`). `git diff --staged` : shows the changes you have done *locally* (compared to the last version that was **committed**) and that you have **added** (with `git add` to the *staging area*) but you have **not committed** (with `git commit`). ## Rule of thumb: when to *add* and when to *commit*? Every time you make **small changes** to a file, use `git add file1` (where `file1` is the name of the file you have been working in). Every time you **finalize an important task**, do a **commit**. Remember **commits** have **descriptive metadata** attached to them. Hence, try to commit significant changes and use **descriptive metadata** that will allow your future self (and that of your colleagues) to understand in a few words what change you did in that **commit**. ------------------- ### Git tips - When using git on the command line, a very useful feature is git alias to give an alias to git commands that you use very often. For example, I have a git alias which shortens `git status` to `git st`. To add aliases, you can type `git config --global alias.status st` - To setup VSCode as the default editor for commit messages you can do: `git config --global core.editor "code --wait"` See more on how to setup the VSCode as the default editor for other git commands [here](https://www.roboleary.net/vscode/2020/09/15/vscode-git.html) ## Summary of Git commands `git init` : to initialize a Git repository in dir1 (a hidden .git/ directory will be created, which you can see by doing `ls -a`). This will be the so-called local master branch. `git add file1` : this will tell Git to basically "pay attention" to this file ("keep track of it") and put it in the so-called *staging area*. `git commit -m “text_commit1”` : this will tell Git to record such a change with the *descriptive metadata* “text_commit1”. Replace “text_commit1” with a short description of the change made to the file. For example: `git commit –m “Added sum function”`. If you only type git commit, the editor you set by default (when installing Git) will open so that you can write the descriptive metadata “text_commit1”. But always try to keep “text_commit1” as short as possible. See the following blog where you can find principles on how to write Git commit messages. `git status` : this will show you the tracked and untracked files. The “untracked” file(s) message means that there is(are) file(s) in the directory that Git is not keeping track of. If you want Git to track them, then you should add the files using `git add file_name`. `git diff` : shows the changes you have done locally (compared to the last version that was committed) but you have not added (with git add to the staging area) nor committed (with git commit). `git diff --staged` : shows the changes you have done locally (compared to the last version that was committed) and that you have added (with git add to the staging area) but you have not committed (with git commit). `git diff id_commit1 id_commit2` : where id_commit1 and id_commit2 would be the unique identifiers of the commits (that can be seen with `git log`). This will show the difference between the two versions of the file (when id_commit1 was committed and id_commit2 was committed). `git log` : shows the project’s history. It will show the unique identifier of each commit related to that repository (the very large combination of letters and numbers next to "commit"), the author (who made each commit), the date (when each commit was made) and the metadata of the commit (the message defining what the change is about). Use the `q` to exit the view of the log. `git log --oneline` : shows a summarized version of `git log`. `git commit --amend` : Update/correct previous commit. `git show commit_hash:file_name` : this will show the file (replace file name in `file_name`) as it was by the time of a given commit (referred to via the `commit_hash`). It does not modify the file. It just shows it. `git grep pattern` : replace pattern by a string you want to search for. This command is more powerful that `history | grep git` for example, in the sense that `git grep` will go into all the files of the git repo and look for the pattern.