<style> .reveal { font-size: 30px; } </style> ## Reproducible Research 2 - Git Wojciech Hardy <!-- Put the link to this slide here so people can follow --> link: https://hackmd.io/@WHardy/RR-git1 ![](https://i.imgur.com/ERha6aj.jpg) --- ### So what's Git? A Version Control System. - VCS tracks the history of changes (e.g. within a folder). - The history includes: what was done, who did it, when, why, etc. - Teams can collaborate on a project and recover a previous version if necessary. - See: [Git handbook](https://guides.github.com/introduction/git-handbook/) --- ### Principles - There's a central repository - the one predefined source of truth. - You start by grabbing the most up-to-date version from the central repo. - Work is split into increments. - Git gives you tools to resolve with file conflicts, etc. --- ### How does it work? an example ![](https://i.imgur.com/kAhvtxk.png) ---- ![](https://i.imgur.com/iVMN1GJ.png) ---- ![](https://i.imgur.com/pS0xho2.png) ---- ![](https://i.imgur.com/kKNRl5G.png) ---- ![](https://i.imgur.com/mxfFc2u.png) ---- ![](https://i.imgur.com/5d8IaOC.png) ---- ![](https://i.imgur.com/ZGIbDg3.png) ---- ![](https://i.imgur.com/rwBYE2S.png) ---- ![](https://i.imgur.com/nF9oH4d.png) ---- ![](https://i.imgur.com/M9uu5Rb.png) ---- ![](https://i.imgur.com/XXiNeOh.png) ---- ![](https://i.imgur.com/8F0idee.png) ---- ![](https://i.imgur.com/kj699fl.png) --- ### How is this helpful? - VCS ensures we don't mess anything up permanently. - We can use it to collaborate. - We can use it to test and develop new versions (branching) without interfering with the working one. - We can use it to store our outputs, share them with the public and allow others to contribute (e.g. update our codes). - We can roll back to a previous version whenever we need. --- ### GIT general info - Open source distributed version control system - Unlike once popular centralized VCS, DVCSs like Git don’t need a constant connection to a central repository - Created in early 2000's by Linus Torvalds during work on Linux kernel project - Moderately difficult to learn, very difficult to master - Became so popular that it effectively replaced older tools (svn, mercurial, svc) --- ### Before we start: let's install Git! (Check if we have it?) - `git --version` - `which git` - `where git` --- ### If not there: installation - Linux: - `$ sudo apt install git-all` - Windows: - Download and install [Git for Windows](https://gitforwindows.org/) - macOS: - `$ xcode-select --install` if you don't have Xcode - `$ brew install git` using Homebrew [Also check here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) --- ### Why "Git"? You can actually pick... (citing Wikipedia:) <!-- .slide: style="font-size: 24px;" --> Torvalds (...): [*"I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'."*](https://archive.kernel.org/oldwiki/git.wiki.kernel.org/index.php/GitFaq.html#Why_the_.27Git.27_name.3F) The *man page* describes Git as [*"the stupid content tracker"*](https://git-scm.com/docs/git.html). The read-me file of the source code: "git" can mean anything, depending on your mood. - Random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant. - Stupid. Contemptible and despicable. Simple. Take your pick from the dictionary of slang. - "Global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room. - "Goddamn idiotic truckload of sh*t": when it breaks. The source code for Git refers to the program as, "the information manager from hell." --- ### Ouch! In Git's defense - It actually works! - It is said that Git replaced the older solutions because the latter were even worse... - ... and it's flexibility helps to use more modern project workflows (scrum/agile, etc) - Git has both CLI and visual plugins in IDEs (integrated development environments) and integration with various apps, services, etc. --- ### Popularity of different VCS among developers ![](https://i.imgur.com/QIahk7i.png =600x350) *Source*: [Stackoverflow, *Developer Survey Results 2017*](https://insights.stackoverflow.com/survey/2017#technology) ---- ### Popularity of collaboration tools among devs ![](https://i.imgur.com/8niYUUc.png =600x350) *Source*: [Stackoverflow, *Developer Survey Results 2020*](https://insights.stackoverflow.com/survey/2020#development-environments-and-tools) [44,328 responses, select all that apply] ---- ### Extensively used tools over last year + intention to continue ![](https://i.imgur.com/4mIEodq.png) *Source*: [Stackoverflow, *Developer Survey Results 2021*](https://insights.stackoverflow.com/survey/2021#most-popular-technologies-tools-tech-prof) (30% of those not working with Git say they'd like to) --- ### Let's take a look at a simple project in Git <!-- .slide: style="font-size: 24px;" --> - **repository**: the entire collection of project's files and folders;</br> along with version history (including alternative versions). Short: "repo" - **commit**: basic unit of work (circle) - **branch**: set of units of work (downward line) - **main (/master)**: usual name for the main repo branch (production branch) - **develop** here is just a name given to the branch with developed features. You pick the names. - **merge/rebase** are used to combine two branches ![](https://i.imgur.com/OExHiqg.png =200x) --- ### CLI vs GUI - Today we will work with Git using its CLI git bash to become familiar with basic commands ![](https://i.imgur.com/x14KPPC.png =250x) *Source*: [XKCD](https://imgs.xkcd.com/comics/git.png). --- ### First though, some basic terminal commands <!-- .slide: style="font-size: 24px;" --> Terminal: - `cd pathname` for navigation (`cd ..` to go up one level) - `mkdir foldername` to create a new folder - `dir` to list files/folders in the current path (`ls` in OS/Linux/bash) - `echo text` to print a message - `>` is a redirection indicator - `>>` for redirection with appending - `echo text > file` to print a text to a file (and overwrite it) - `echo text >> file` to add new lines to a file. See also `man echo` - `touch file` to just create a new file (empty); Note: it does something different on Linux if a file already exists - `cat file/message` display text - `diff file1 file2` show difference between two files --- ### Exercise 0: terminal (go down for more) 1. Open up the terminal 2. Pick a space for your reproducible research materials and navigate there E.g. you can use `cd <pathname>` for navigation or do "Git Bash here" in Windows. ---- 3. Create a new folder named RR_git1 `mkdir RR_git1` ---- 4. Navigate to your new folder `cd RR_git1` ---- 5. Create a classes.txt file with a line with today's date. echo "3/6/2023" > classes.txt *Note: at the end of the classes you'll be asked to do a screenshot of your workspace and send it via e-mail to wojciechhardy@uw.edu.pl (you can crop it just to leave the folder contents)* --- ### Great! Now that it's covered let's look at Git diagnostics ([also see standard command line options](https://tldp.org/LDP/abs/html/standard-options.html)) - `git`, `git --help` to display git inline help - `git [cmd] --help` to display web help about cmd - `git --version` to display diagnostic info (version) - `git status` to display local repository status - `git log` to display history of commits These commands do not alter anything. Feel free to use them frequently to verify and understand the results of your actions. --- ### Basic git commands - setting up the repository <!-- .slide: style="font-size: 24px;" --> `git init [repo_name]` to initialize an empty repository in the current [or specified] directory `git clone [repo_name] [clone_name]` to create a linked copy of a repository `git config -l` to view all configuration options Config structure: `git config [-l] [--scope] [option_name] [value]` There are three levels of configuration (i.e. scope): `--system` - pertains to repositories of all system users `--global` - pertains to all user's repositories, overrides system settings `--local` (default) - pertains to the current repository, overrides global settings Note\: `global` configuration will be visible only if you've used Git before (and added some options) Note 2: `local` configuraiton will be visible only if we're in a Git repository --- ### Exercise 1: creating a repository (go down for more) 1. In your RR_git1 directory, initiate a git repository named EX1 (hint: you can either initiate it with that name, or create folder named EX1, enter it, and initiate the repository inside) `git init EX1` `cd EX1` ---- 2. List all available configuration options. `git config -l` ---- 3. List all global options `git config -l --global` (Note: this will only work if you've ever changed any global options) ---- 4. List all local options `git config -l --local` ---- 5. Set global option 'user.name' to your name `git config --global user.name "Name"` ---- 6. Set global option 'user.email' to your e-mail `git config --global user.email "address@smth.smth"` ---- 7. List all global options, check the difference `git config -l --global` ---- 8. Set local option 'user.name' to your initials `git config --local user.name "AB"` ---- 9. List all local options `git config -l --local` --- ### The three git states Unlike the other VCS, Git has something called the "staging area" or "index". This is an intermediate area where commits can be formatted and reviewed before completing the commit. ![](https://i.imgur.com/Jk9X5WC.png =400x) *Source for this and following slides*: https://git-scm.com/ --- ### The three trees [See here for a detailed description](https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified) And think of a tree as an ordered collection of files. | [Tree](https://www.google.com/search?q=tree+directory) |Role | | -------- | -------- | | HEAD | Last commit snapshot | | Index | Proposed next commit snapshot | | Working directory | Sandbox | --- ### HEAD HEAD is a snapshot of your last commit on a given branch. If you want to see what that snapshot looks like, simply type: **`$ git cat-file -p HEAD` If you recall the branch graph, this is the latest commit on the branch. --- ### Staging area aka Index The index is your proposed next commit. Command that shows you what your index currently looks like: **`git ls-files`** It's a box where you put the files you'd like to include in your next commit (sort of a work-in-progress not-yet-commited commit) --- ### Working directory Think of the working directory as a sandbox, where you can try changes out before sending them to your staging area (index) and then to history. --- ### Ok, so let's examine this process #0 At this point, only the working directory tree has any content. ![](https://i.imgur.com/h3ZPQGp.png) --- #1 We use **`git add`** to take content in the working directory and copy it to the index. ![](https://i.imgur.com/h7ZESGS.png) (Note that we keep all boxes. Two currently store the same information) --- #2 We use **`git commit`**, which takes the contents of the index and saves it as a permanent snapshot, creates a commit object which points to that snapshot, and updates master to point to that commit. ![](https://i.imgur.com/JptS5e9.png) (In general terms, there's now a new commit on the main branch, and there's an indicator pointing to it saying "hey, this is where we're at".) --- #3 If we run **`git status`**, we’ll see no changes, because all three trees are the same. ![](https://i.imgur.com/YBWWbYl.png) --- ### Git commands: the workflow **`git add [filename(s)]`** to add files to the staging area **`git add .`** to add **all** new/modified files to the staging area **`git commit -m "<commit description>"`** to create a new commit with what's in the staging area At any point you can: **`git status`** to verify where you are, and what are the differences between the three trees **`git diff`** to compare last commit with what's in the working directory **`git log`** to view the commit history --- ## Exercise 2: adding commits (go down for more) <!-- .slide: style="font-size: 24px;" --> 1. In your RR_git1 directory, initiate a git repository named EX2 (check **`git status`** and **`git diff`** to get a better feel of this) `cd ..` `git init EX2` ---- 2. Go inside the new repository. `cd EX2` ---- 3. Create a file named README.md, add a single line of text inside, save the file [hint: you can use _echo_ or create it manually with e.g. Notepad] (check **`git status`** and **`git diff`** to get a better feel of this) `git status` `echo "a line of text" > README.md` `git status` ---- 4. Stage the new file. (check **`git status`** and **`git diff`** to get a better feel of this) `git add README.md` `git status` ---- 5. Commit the file (remember to include a helpful commit description!) (check **`git status`** and **`git diff`** and **`git log`** to get a better feel of this) `git commit -m "Added a README.md file"` `git status` --- ### Exercise 3: adding commits (go down for more) 1. Add another line of text to the file you created. `echo "add a second line of text" > README.md` ---- 2. Create a new file named "readme.txt". `touch readme.txt` ---- 3. Create an empty folder named "data" `mkdir data` ---- 4. Run the repository diagnostics as above. `git diff` `git status` ---- 5. Stage and commit the modifed file and the new file. `git add .` `git commit -m "Added a new readme.txt file and modified readme.md"` ---- 6. Check **`git log`**, etc. again. `git log` `git status` `git diff` --- ## Exercise 4: using .gitignore 1. Create data/data1.csv file and fill it with a random data line (can be just comma-separated text, it doesn't matter), check status and diff `echo -e "Var1, Var2\n5, 7" > data/data1.csv` `git status` `git diff` ---- 2. Create a .gitignore file (yes, starting with a dot), put the word 'data' inside (it's the name of our directory), check status and diff `echo "data" > .gitignore` `git status` `git diff` *.gitignore* is a file that tells git to ignore certain elements. Should we commit it? <- depends on the workflow and, e.g., who we're working with (we might not want to share it with collaborators) --- ## Class summary Please do a screenshot of the insides of your RR_git1 folder and send it to wojciechhardy@uw.edu.pl Try to store your files in a safe place so we can pick up where we left next time. Note: if you simply copy the folder to a pendrive or smth, the repository will continue working (everything you need is already inside!) --- ## Stuck in VIM? If you forgot about adding a message to your commit, you might have ended up in VIM. It's a free, text-editting software that sometimes feels like a trap. *Tl;dr:* hit [ESC], then type **`:q`**** and press **`Enter`** . Repeat your commit with a helpful description. You can also try adding the comment in VIM instead, and then exit with **`:wq`** instead, which should do the commit with the comment. See more in this [helpful Stackoverflow answer](https://stackoverflow.com/a/11828573). --- ## Useful links [Read more on the three trees with the **`git reset`** guideline](https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified) [Cheat sheet 1 (Atlassian)](https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet) [Git-scm in general](https://git-scm.com) [Atlassian in general](https://www.atlassian.com/git) If you need more, just Google tutorials/blog posts/YouTube videos until you find one that makes it clear :) Lots to choose from! ---
{"metaMigratedAt":"2023-06-17T22:08:51.891Z","metaMigratedFrom":"YAML","title":"Reproducible Research 2 - Git","breaks":true,"slideOptions":"{\"theme\":\"night\"}","contributors":"[{\"id\":\"1c10bb23-6c4c-4c1b-8586-5f8d56305139\",\"add\":31395,\"del\":15368}]","description":"Wojciech Hardy"}
    888 views