owned this note
                
                
                     
                     owned this note
                
                
                     
                    
                
                
                     
                    
                
                
                     
                    
                        
                            
                            Published
                        
                        
                            
                                
                                Linked with GitHub
                            
                            
                                
                                
                            
                        
                     
                
            
            
                
                    
                    
                
                
                    
                
                
                
                    
                        
                    
                    
                    
                
                
                
                    
                
            
            
         
        
        ---
title: Reproducible Research 2 - Git
tags: RR2024 - DS/QF
description: 
slideOptions:
  theme: night
---
<style>
.reveal {
  font-size: 30px;
}
</style>
## Reproducible Research 2 - Git
Wojciech Hardy
<!-- Put the link to this slide here so people can follow -->
link: https://hackmd.io/@WHardy/RR24-git1

---
### So what's Git? A Version Control System.
- VCS tracks the history of changes (e.g. within a folder).
- The history includes: what was done, who did it, when, why, etc.
- Teams can collaborate on a project and recover a previous version if necessary.
- More sophisticated workflows include code reviews steps, automated testing, etc.
- See: [Git handbook](https://guides.github.com/introduction/git-handbook/)
---
### Principles
- There's a central repository - the one predefined source of truth.
- You start by grabbing the most up-to-date version from the central repo.
- Work is split into increments (called _commits_)
- Git gives you tools to resolve file conflicts, etc.
---
### How does it work? an example
The project is stored in a central repo.

----
Contributor 1 grabs the most recent version.

----
Contributor 1 does some new work locally.

----
Contributor 1 checks if central version changed in the meantime.

----
Contributor 1 puts their changes in the central repo.

----
Central repo now stores the new step on top of the previous one.

----
Contributor 2 joins in and goes through the same steps.

----
![]()

----
![]()

----
![]()

----
![]()

----
This can go on and on.

----
And involve a lot more people.

---
### How is this helpful?
- VCS ensures we don't mess anything up permanently.
- We can use it to collaborate.
- We can use it to test and develop new versions (branching) without interfering with the working one.
- We can use it to store our outputs, share them with the public and allow others to contribute (e.g. update our codes).
- We can roll back to a previous version whenever we need.
---
### GIT general info
- Open source distributed version control system
    - Unlike once popular centralized VCS, DVCSs like Git don’t need a constant connection to a central repository
- Created in early 2000's by Linus Torvalds during work on Linux kernel project
- Moderately difficult to learn, very difficult to master
- Became so popular that it effectively replaced older tools (svn, mercurial, svc)
---
### Before we start: let's install Git!
(Check if we have it?)
-  `git --version` 
(on MacOS, this might prompt installation if you don't have Git)
-  `which git`
-  `where git`
---
### If not there: installation
- Linux:
    - `$ sudo apt install git-all`
- Windows:
    - Download and install [Git for Windows](https://gitforwindows.org/)
- macOS:
    - `git --version` might prompt installation in newer OS versions
    - `$ xcode-select --install` if you don't have Xcode
    - `$ brew install git` using Homebrew
    - or get [the binary installer](https://git-scm.com/download/mac)
[Also check here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
---
### Why "Git"? You can actually pick...  (citing Wikipedia:)
<!-- .slide: style="font-size: 24px;" -->
Torvalds (...): [*"I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'."*](https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/GitFaq.html#Why_the_.27Git.27_name.3F)
The *man page* describes Git as [*"the stupid content tracker"*](https://git-scm.com/docs/git.html).
The read-me file of the source code: "git" can mean anything, depending on your mood.
- Random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- Stupid. Contemptible and despicable. Simple. Take your pick from the dictionary of slang.
- "Global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "Goddamn idiotic truckload of sh*t": when it breaks.
The source code for Git refers to the program as, "the information manager from hell." 
---
### Ouch! In Git's defense
- It actually works!
- It is said that Git replaced the older solutions because the latter were even worse...
- ... and it's flexibility helps to use more modern project workflows (scrum/agile, etc)
- Git has both CLI and visual plugins in IDEs (integrated development environments) and integration with various apps, services, etc.
- A lot of firms build on some Git implementation.
---
### Popularity of different VCS among developers

*Source*: [Stackoverflow, *Developer Survey Results 2017*](https://insights.stackoverflow.com/survey/2017#technology)
----
### Popularity of collaboration tools among devs

*Source*: [Stackoverflow, *Developer Survey Results 2020*](https://insights.stackoverflow.com/survey/2020#development-environments-and-tools) [44,328 responses, select all that apply]
----
### Extensively used tools over last year + intention to continue

*Source*: [Stackoverflow, *Developer Survey Results 2021*](https://insights.stackoverflow.com/survey/2021#most-popular-technologies-tools-tech-prof)
(30% of those not working with Git say they'd like to)
---
### Let's take a look at a simple project in Git
<!-- .slide: style="font-size: 24px;" -->
- **repository**: the entire collection of project's files and folders;</br> along with version history (including alternative versions). Short: "repo" 
- **commit**: basic unit of work (here as a circle)
- **branch**: set of units of work (downward line)
- **main (/master)**: usual name for the main repo branch (production branch)
- **develop** here is just a name given to the branch with developed features. You pick the branch names.
- **merge/rebase** are used to combine two branches

---
### CLI vs GUI
- Today we will work with Git using its CLI git bash to become familiar with basic commands 

*Source*: [XKCD](https://imgs.xkcd.com/comics/git.png).
---
### First though, some basic terminal commands
<!-- .slide: style="font-size: 24px;" -->
Terminal:
- `cd pathname` for navigation (`cd ..` to go up one level)
- `mkdir foldername` to create a new folder
- `dir` to list files/folders in the current path (`ls` in OS/Linux/bash)
- `echo text` to print a message
- `>` is a redirection indicator
- `>>` for redirection with appending
- `echo text > file` to print a text to a file (and overwrite it)
- `echo text >> file` to add new lines to a file. See also `man echo`
- `touch file` to just create a new file (empty); Note: it does something different on Linux if a file already exists
- `cat file/message` display text
- `diff file1 file2` show difference between two files
- `rm` to remove a file and `rmdir` to remove a folder
---
### Exercise 0: terminal (go down for more)
1. Open up the terminal
2. Pick a space for your reproducible research materials and navigate there
E.g. you can use `cd <pathname>` for navigation or do "Git Bash here" in Windows.
----
3. Create a new folder named RR_git1
`mkdir RR_git1`
----
4. Navigate to your new folder
`cd RR_git1`
----
5. Create a classes.txt file with a line with today's date.
`echo "3/6/2024" > classes.txt`
---
### Great! Now that it's covered let's look at Git diagnostics
([also see standard command line options](https://tldp.org/LDP/abs/html/standard-options.html))
- `git`, `git --help`   to display git inline help
- `git [cmd] --help` to display web help about cmd
- `git --version`   to display diagnostic info (version)
- `git status` to display local repository status
- `git log` to display history of commits
These commands do not alter anything. Feel free to use them frequently to verify and understand the results of your actions.
---
### Basic git commands - setting up the repository
<!-- .slide: style="font-size: 24px;" -->
`git init [repo_name]` to initialize an empty repository in the current [or specified] directory
`git clone [repo_name] [clone_name]` to create a linked copy of a repository
`git config -l` to view all configuration options
Config structure: `git config [-l] [--scope] [option_name] [value]`
There are three levels of configuration (i.e. scope):
`--system` - pertains to repositories of all system users
`--global` - pertains to all user's repositories, overrides system settings
`--local` (default) - pertains to the current repository, overrides global settings
Note\: `global` configuration will be visible only if you've used Git before (and added some options)
Note 2: `local` configuraiton will be visible only if we're in a Git repository
---
### Exercise 1: creating a repository (go down for more)
1. In your RR_git1 directory, initiate a git repository named EX1 and enter it
(hint: you can either initiate it with that name, or create a folder named EX1, enter it, and initiate the repository inside)
`git init EX1` 
`cd EX1`
or
`mkdir EX1`
`cd EX1`
`git init`
----
2. List all available configuration options.
`git config -l`
----
3. List all global options
`git config -l --global`
(Note: this will only work if you've ever changed any global options)
----
4. List all local options
`git config -l --local`
----
5. Set global option 'user.name' to your name 
`git config --global user.name "Name Surname"`
----
6. Set global option 'user.email' to your e-mail
`git config --global user.email "your.email@smth.smth"`
----
7. List all global options, check the difference
`git config -l --global`
----
8. Set local option 'user.name' to your initials
`git config --local user.name "AB"`
----
9. List all local options
`git config -l --local`
---
### The three git states
Unlike the other VCS, Git has something called the "staging area" or "index". This is an intermediate area where commits can be formatted and reviewed before completing the commit.

*Source for this and following slides*: https://git-scm.com/
---
### The three trees
[See here for a detailed description](https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified)
And think of a tree as an ordered collection of files.
| [Tree](https://www.google.com/search?q=tree+directory) |Role | 
| -------- | -------- |
| HEAD     | Last commit snapshot |
| Index     | Proposed next commit snapshot     |
| Working directory    | Sandbox     |
---
### HEAD
HEAD is a snapshot of your last commit on a given branch.
If you want to see what that snapshot looks like, simply type:
**`$ git cat-file -p HEAD`
If you recall the branch graph, this is the latest commit on the branch.
---
### Staging area aka Index
The index is your proposed next commit.
Command that shows you what your index and working area currently hold (also check options):
**`git ls-files`**
It's a box where you put the files you'd like to include in your next commit (sort of a work-in-progress not-yet-commited commit)
---
### Working directory
Think of the working directory as a sandbox, where you can try changes out before sending them to your staging area (index) and then to history.
---
### Ok, so let's examine this process
#0 At this point, only the working directory tree has any content.
 

---
#1 We use **`git add`** to take content in the working directory and copy it to the index.

(Note that we keep all boxes. Two currently store the same information)
---
#2 We use **`git commit`**, which takes the contents of the index and saves it as a permanent snapshot, creates a commit object which points to that snapshot, and updates master to point to that commit.

(In general terms, there's now a new commit on the main branch, and there's an indicator pointing to it saying "hey, this is where we're at".)
---
#3 If we run **`git status`**, we’ll see no changes, because all three trees are the same.

---
### Git commands: the workflow
**`git add [filename(s)]`** to add files to the staging area
**`git add .`** to add **all** new/modified files to the staging area
**`git commit -m "<commit description>"`** to create a new commit with what's in the staging area
At any point you can:
**`git status`** to verify where you are, and what are the differences between the three trees
**`git diff`** to compare last commit with what's in the working directory
**`git log`** to view the commit history
---
## Exercise 2: adding commits (go down for more)
<!-- .slide: style="font-size: 24px;" -->
1. In your RR_git1 directory, initiate a git repository named EX2 
`cd ..`
`git init EX2`
(check **`git status`** and **`git diff`** to get a better feel of this)
----
2. Go inside the new repository.
`cd EX2`
----
3. Create a file named README.md, add a single line of text inside, save the file [hint: you can use _echo_ or create it manually with e.g. Notepad]
`touch README.md`
`echo "one line" >> README.md`
(check **`git status`** and **`git diff`** to get a better feel of this)
----
4. Stage the new file. 
`git add README.md`
(check **`git status`** and **`git diff`** to get a better feel of this)
----
5. Commit the file (remember to include a helpful commit description!)
`git commit -m "Added README.md with one line of text"`
(check **`git status`** and **`git diff`** and **`git log`** to get a better feel of this)
---
### Exercise 3: adding commits (go down for more)
1. Add another line of text to the file you created.
`echo "a second line" >> README.md`
----
2. Create a new file named "readme.txt".
`touch readme.txt`
----
3. Create an empty folder named "data"
`mkdir data`
----
4. Run the repository diagnostics as above.
`git status`
----
5. Stage and commit the modifed file and the new file.
`git add .`
`git commit -m "Modified README.md and added readme.txt"`
----
6. Check **`git log`**, etc. again.
`git log`
`git status`
---
## Exercise 4: using .gitignore
1. Create data/data1.csv file and fill it with a random data line (can be just comma-separated text, it doesn't matter), check status and diff
`echo "var1,var2\n1,2" > data/data1.csv`
`git status`
----
2. Create a .gitignore file (yes, starting with a dot), put the word 'data' inside (it's the name of our directory), check status and diff
`touch .gitignore`
`echo "data" >> .gitignore`
`git status`
*.gitignore* is a file that tells git to ignore certain elements. Should we commit it? <- depends on the workflow and, e.g., who we're working with (we might not want to share it with collaborators)
---
## Assignment
While in your EX2 repository, run these four commands:
`git status`
`git log`
`git ls-files`
`git ls-files -o`
Copy the contents of Bash/CLI starting from `git status` and send them to me in a notepad file (wojciechhardy@uw.edu.pl).
Try to store your files in a safe place so we can pick up where we left next time.
Hint: if you simply copy the folder to a pendrive or smth, the repository will continue working (everything you need is already inside!)
---
## Stuck in VIM?
If you forgot about adding a message to your commit, you might have ended up in VIM. It's a free, text-editting software that sometimes feels like a trap.
*Tl;dr:* hit [ESC], then type **`:q`**** and press **`Enter`** .
Repeat your commit with a helpful description.
You can also try adding the comment in VIM instead, and then exit with **`:wq`** instead, which should do the commit with the comment.
See more in this [helpful Stackoverflow answer](https://stackoverflow.com/a/11828573).
---
## Useful links
[Read more on the three trees with the **`git reset`** guideline](https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified)
[Cheat sheet 1 (Atlassian)](https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet)
[Git-scm in general](https://git-scm.com)
[Atlassian in general](https://www.atlassian.com/git)
If you need more, just Google tutorials/blog posts/YouTube videos until you find one that makes it clear :) Lots to choose from!
---