Hilmar Lapp
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee
  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- tags: course notes --- # UPGG Informatics Orientation Bootcamp 2023 :::warning The [notes for Wednesday and Thursday](https://hackmd.io/@8dEsA7nbQsexUv4D0NazQA/SkwLQNBan/edit) have been moved to a separate document. ::: ## Friday Morning ### Version Control with Git #### Instructor: Raven #### Summary and Setup Today we'll follow the swcarpentry notes: https://swcarpentry.github.io/git-novice/ Create and navigate to a new working directory ```` mkdir bootcamp cd bootcamp ```` Version control can keep track of both what you ADD and what you SUBTRACT from the document. However, a conflict can arise if two users are making changes to the same section of the document. It works like an unlimited "undo". #### Setup Git You only need to configure `git` one time at the start! ````bash git config --global user.name "Vlad Dracula" git config --global user.email "vlad@tran.sylvan.ia" ```` Your computer encodes the `enter` input as a character ("newline"). You can change the way Git recognize and encode line endings. On MacOS and Linux: ```bash git config --global core.autocrlf input ``` And on Windows: ```bash git config --global core.autocrlf false ``` Setup a default text editor (optional): ```bash git config --global core.editor "nano -w" ``` Configure the default branch name to be `main` ```bash git config --global init.defaultBranch main ``` You can view all the configurations you have for `git`. ```bash git config --list ``` ##### Creating a Repository Make sure you are in your working directory for this lecture. create a new directory and initialize a new repository. ```bash mkdir planets cd planets git init ``` Print your git version. ```bash git version ``` Create `main` branch and make sure you are on it. ```bash git checkout -b main ``` You can get a report on the repository. ```bash git status ``` It's recommended **NOT** to initialize a new git repository within a pre-existing one. E.g. making a directory called `moons` and call `git init` inside `moons` directory. If you accidentally make nested git repositories just remove the `.git` directory. ```bash rm -rf moons/.git ``` Version control keeps track of files only (not directories). ##### Tracking Changes Make a new file and type some notes. ```bash nano mars.txt ``` We can see that git detected the new file, but it's not being tracked yet: ```bash git status ``` To start tracking changes: ```bash git add mars.txt ``` To **save** the changes, you can run the command `commit` (basically **save** in `git` lexicon). ```bash git commit -m "Start notes on Mars as a base" ``` This will summarizes some changes that you made since last `commit`. After the `-m` flag, you are giving your future self or collaborator a note on what changes you've done in this commit. It should be relatively brief. Take a look at the commit history. ```bash git log ``` If you make further changes, you can run `git diff` to show the differences between the current state of the file and the most recently saved version. ```bash git diff ``` Git add your new changes before committing ```bash git add mars.txt git commit -m "Add concerns about effects of Mars' moons on Wolfman" ``` Git compares the staging arae with the unstaged files by default. If you use git diff after staging your new changes, it won't find a difference. ```bash nano mars.txt git add mars.txt git diff ``` However, you can compare the commited version and the staged version: ```bash git diff --staged ``` Change visualization from the line level (default) to word level changes ```bash git diff --color-words ``` If `git log` is very long: ```bash git log -1 ## Limits to just last commit, change 1 to any number of logs you want git log --oneline ## reduces information to just one line ``` You can git add a directory even if directories are not tracked (it tracks the contents). ```bash mkdir spaceships touch spaceships/apollo-11 spaceships/sputnik-1 git status git add spaceships git status ``` Notice the difference in git status before and after adding spaceships. Commit the changes: ```bash git commit -m "Add some initial thoughts on spaceships" ``` It's best practice to be descriptive with your commit messages. ###### Exercises: ###### Which command(s) below would save the changes of myfile.txt to my local Git repository? Answer number 3 is correct: ```bash git add myfile.txt git commit -m "my recent changes" ``` The staging area can hold changes from any number of files that you want to commit as a single snapshot. ###### Adding multiple files - Add some text to mars.txt noting your decision to consider Venus as a base ```bash nano mars.txt ``` - Create a new file venus.txt with your initial thoughts about Venus as a base for you and your friends ```bash nano venus.txt ``` - Add changes from both files to the staging area, and commit those changes. ```bash git add mars.txt venus.txt git commit -m "Started considering Venus as a base" ``` ###### `bio` Repository - Create a new Git repository on your computer called bio. - Write a three-line biography for yourself in a file called me.txt, commit your changes - Modify one line, add a fourth line - Display the differences between its updated state and its original state. ```bash cd .. # .git already exists here (planets) mkdir bio cd bio git init # initializes git nano me.txt # write biography in the file git add me.txt git commit -m "Add biography file" nano me.txt # adds the fourth line git diff me.txt # shows the differences ``` ##### Exploring History Make some more changes in `mars.txt` ```bash nano mars.txt cat mars.txt ``` See what changed. This will compare to the most recent commit. ```bash git diff HEAD mars.txt ``` Using `HEAD~1` to move down the logs by 1 from the most recent commit (aka 2 commits ago). `HEAD` refers to the most recent commit, and everything before can be referred to relatively to `HEAD`. ```bash git diff HEAD~1 mars.txt ``` Using `git show` shows the commit message on top of differences between a commit and our working directory. ```bash git show HEAD~2 mars.txt ``` `HEAD` is good for looking for a recent commit because it is relative to the most recent one, but if you want to point at a commit with an absolute identifier, you can obtain the ID from `git log`. Each commit gets its own unique one. ```bash git diff f22b25e3233b4645dabd0d81e651fe074bd8e73b mars.txt ``` You don't have t ofuse the full 40-character string, as long as the first few letters/digits you type in are unique from other commit IDs, git will know which one you are referring to(similar logic to `tab completion`). ```bash git diff f22b25e mars.txt ``` Using `git checkout` checks out (i.e., restores) an old version of a file. In this case, we are telling Git we want to recover the version of the file recorded in `HEAD`. ```bash git checkout HEAD mars.txt ``` ##### Don't Lose Your Head If you want to restore a version of a file, but forget to add the filename after `git checkout <commit ID>`, you will enter a unique state called `datached HEAD`. You basically entered an area dedicated for experimenting with that `<commit ID>` version of your repository. Knowing this, you can change files, and even commit those changes. HOWEVER, once you "reattach" the `HEAD`, those changes and commits you've done in the `detached HEAD` state won't come with you. There's a way to retain all those commits you made in the `detached HEAD` state. Basically, you will make a new `branch` to keep all those commits and make an `alternate reality` version of your repository. Once, you've done that, you can "reattach" `HEAD` to return to the original repository on the `main` branch. Essentially, now you have 2 versions of your repositories on 2 different branches. I'll demonstrate the codes below: ```bash git checkout f22b25e # Now you entered the 'detached HEAD' state with your repository looking like the stage in which you committed the f22b25e... commit nano mars.txt # in this state, you can make changes... git add mars.txt git commit -m "made changes in mars.txt" # or even commit in this state ### YOU CAN RETURN TO MAIN WITHOUT RETAINING THOSE COMMITS BY git checkout main ### ALTERNATIVELY, if you wish to retain those changes somewhere, make a new branch git checkout -b <new-branch-name> # This branch will keep all changes you made inside the "detached HEAD" state git checkout main # will bring you back to the original state before you ran the 'git checkout f22b25e' line ``` ###### Exercise: RECOVERING OLDER VERSIONS OF A FILE Which commands below will let her recover the last committed version of her Python script called data_cruncher.py? Answer is 2 and 4: ```bash git checkout HEAD data_cruncher.py ``` or ```bash git checkout <unique ID of last commit> data_cruncher.py ``` ##### **Important!!!** `Checkout` vs `Restore` `git checkout` does two different things. 1. Restoring a previous commit 2. Navigate to a branch This is confusing, so the developers try to separate the two functions to two different commands. Now, to restore a commit, run: ```bash git restore <commit ID> ``` `git checkout`'s restore function is slowly depreciated. Let's make it a habit to use `restore` for the restore function!!! ##### Ignoring things You don't always want `git` to keep track of EVERY files in the repository (e.g., raw files, large files etc.). You can tell `git` to ignore these files. Create a new directory and files: ```bash mkdir results touch a.dat b.dat c.dat results/a.out results/b.out git status ``` The git status will show the new files. To ignore them, make a ``.gitignore`. ```bash nano .gitignore ``` type: ```` *.dat results/ ```` ```bash git status ``` You also need to keep track of your `.gitignore`: ```bash git add .gitignore git commit -m "Ignore data files and the results folder" git status ``` Take a look at your ignored files: ```bash git status --ignored ``` You can override the ignored settings. You can use `-f` to force add the file to staging area. ```bash git add -f a.dat ``` ###### Exercise: IGNORING NESTED FILES If you have a directory structure like this: ```` results/data results/plots ```` You can ignore only one of the subdirectories by specifying in `.gitignore` ```` results/plots/ ```` ##### Remotes in GitHub ###### Create a remote repository Log into GitHub and click create new repository. Click the create repository button. ###### Connect local to remote repository Copy the ssh from your remote repository. Go to your local directory: ```bash git remote add origin git@github.com:vlad/planets.git ``` Check that it worked: ```bash git remote -v ``` ###### SSH Background and Setup Create private and public keys in your computer. ```bash= ls -al ~/.ssh ``` Create an SSH key pair if you don't already have one. Choose whatever password you like for your passphrase. Write down and make sure to remember the passphrase. ```bash ssh-keygen -t ed25519 -C "vlad@tran.sylvan.ia" ``` If you type ```bash ls -al ~/.ssh ``` the new key pair will show. Copy the public key to GitHub ```bash cat ~/.ssh/id_ed25519.pub ``` Copy the output and go to GitHub. Click profile > settings > SSH keys Add a new one. Paste the public key. Check that the key is setup on GitHub: ```bash ssh -T git@github.com ``` ###### Push local changes to a remote Check you are still in your planets directory. After authenticating your ssh key pair, let's push your changes from the local to remote repository. ```bash git push origin main ``` If you go to your GitHub page, you should be able to see all the files that you committed, and the version history of your repository. Including the local changes. ###### Collaborating Practice adding a collaborator: - Go to the settings on your plantes repositories. - Go to collabors - Add people - Search by github username or mail You should receive an email from the person inviting you to collaborate. Create a new directory called collaboration: ```bash= cd .. # get out of your own plants directory mkdir collaboration cd collaboration ``` Clone your collaborator's repository from GitHub ```bash= git clone git@github.com:vlad/planets.git . ``` Go to your collaborator's repository and create a new file ```bash= cd planets nano pluto.txt ``` Commit your changes ```bash= git add pluto.txt git commit -m "Add notes about Pluto" ``` And push to your collaborator's GitHub repository ```bash= git push origin main ``` If you go to the repository GitHub page, you can see your new file and your commit in the commit history. However, your collaborator won't have the new file locally unless they pull from the remote repository. ```bash= git pull origin main ``` To test the updates before git pull, you can use: ```bash= git fetch origin main ``` You can also git diff your local vs your remote: ```bash= git diff main origin/main ``` ##### Conflicts Modify the mars.txt file in your collaborator's repository. ```bash nano mars.txt ```` Push change to GitHub ```bash= git add mars.txt git commit -m "Add a line in our home copy" git push origin main ``` If the owner also makes parallel changes in their own repository: ```bash= nano mars.txt git add mars.txt git commit -m "Add a line in my copy" git push origin main ``` It won't work because there is a conflict in the mars.txt You'll need to pull the repository and resolve the conflict ```bash git pull origin main ``` Edit the file and resolve the conflicts and commit the changes. The conflict will be indicated within the file between the `<<<<<<< HEAD`, `=======`, and `>>>>>>>`. ```bash nano mars.txt ``` ```bash git add mars.txt git commit -m "Merge changes from GitHub" ``` **Remember to always git pull before you git push.** ##### Branches You can branch off the main version of the repository into your own space that you can commit and push freely. Your collaborator can also do the same thing on their own branch. Then, if you want to merge those branches into main (aka making changes permanent), you can review and approve if the merge has any conflict or works. ### Data and Project Organization #### Instructor: Emiliano Data organization is an ongoing process! Setup: Open terminal, a text editor of choice, "gapminderDataFiveYear_superDirty data.xlsx" file, create a new directory to work in #### Introduction You just started a new job/a new rotation, someone hands you this data file to analyze. ```bash= #Let's look at the data head gapminderDataFiveYear_superDirty data.xlsx ``` We can't tell - What is it? - Where did it come from? - When was it collected? - Has anything been changed? If so, why was it changed? We'll look into how to store data for yourself and others: #### Project Structure Here are some characteristics of files you should pay attention to: - File History - File Function - File Format - File Origin Basic intiutive project directory: - Code Directory (Keep scripts here seperate from data) - README.txt (add info here about the project and its organization: project name, date, contact info, where data came from, other info about project) - Data Directory keeps all your data - Raw data dir: keep raw data seperatefrom everything else - Modified data dir: keep modified data folder here that have been analyzed by scripts/are stopping points - Output Directory for files generated from other files - Like figures, stats, paper etc. Key points: - Organize files so that they are intuitive - Have README files within folders to describe the project and gives context for the analyses - Make a copy of raw file, so you don't have to modify the raw version - Keep clear record of modification that has been made by making sure your scripts are reproducible from raw data - Compartmentalize your directory ##### Helpful Naming conventions - Keep track of steps by numbers (e.g., `01_file.txt`, `02_file.txt`) - Use dates and version of files (e.g. `2023-08-24_file.txt`, `2023-08-25_file.txt`) ##### Let's organize the directory that we just created. ```bash= #Inside our working directory that we just created #Creating data dir with orig and raw subdir mkdir data mkdir data/original mkdir data/raw mv gapminder* data/original #moving data files to data/original cd data/original ls #gapminderDataFiveYear_superDirty.xlsx #gapminderDataFiveYear_superDirty.txt chmod 444 gapminder* #removes write and execute access at all permission levels ls -l #checking file permissions -r--r--r--@ cd ..#returning to our main project folder nano README.txt #Inside our README.txt>> #Project Name: UPGG Bootcamp Data Organization #Date: 2023-08-25 #Email/Contact info:Emiliano Sotelo-jemilianos@gmail.com #Data downloaded from: https://reproducible-science-curriculum.github.io/organization-RR-Jupyter/setup/#:~:text=gapminderDataFiveYear_superDirty%20data.xlsx #Goal is to learn how to organize our data projects. ``` You might want to leave your personal email as a contact instead of your Duke email, just because you might not have access to your Duke email later in your life. Your README might outlast your access to Duke email. You most likely will have access to your personal email much longer than your Duke email. ##### Modifying Data To avoid modifying the original data, we should make a copy and go from there ```bash= mkdir cleaned #in smae dir as original cp original/gapminder*.xlsx cleaned/. cd cleaned chmod 777 gapminder*.xlsx #gives all the permissions open gapminder*xlsx # this will open the file in excel nano README.txt #Inside README.txt>> #-Data cleaning steps for gapminder: #-Removed 5th row #Better to write out a script to modify our data than to manually clean/edit our data ``` After we've cleaned the data in the copied file, lets make some output directories to sort our files: ```bash= mkdir code mkdir output mkdir doc ``` Here's an example of a template for an analysis that you can adapt: https://github.com/jemilianosf/template_analysis. ### Sharing Jupyter Notebooks #### Instructor: Hilmar Lapp You already have all the materials needed for this last lesson, but it will apply to any other work as well. ##### Sharing Jupyter Notebooks using GitHub How do we share work to other people? 1. Static - sharing a snapshot of your work, cannot be changed 2. Dynamic - they can interact with, or change, or run the code without having to ask you to change anything ##### Binder Running code is harder than displaying code. To run code you need: - Hardware - Code, including dependencies (like r or python, and packages) Binder provides both. Example binder link: https://mybinder.org/v2/gh/Reproducible-Science-Curriculum/data-exploration-RR-Jupyter/gh-pages?filepath=notebooks%2FData_exploration_run.ipynb You can execute each chunk of code on a "live" notebook in the link above. Go to your GitHub repository to copy the https link paste in: https://mybinder.org/ under "GitHub repository name or URL". Paste the path to your notebook under "Path to a notebook file (optional)". Mybinder creates a container, a very lightweight virtual machine in a host hardware. After creating you notebook execution environment you can get an URL that you can share with others. ###### Adding dependencies If you only have simple code with standard python code, it should run without extra steps. But if you have a more complex project, you need to tell binder what your dependencies are (what packages to import). Go to your GitHub repository and create a new file called `requirements.txt`. You can add dependencies like `pandas`, `numpy`, `matplotlib`, `seaborn`. Binder will recognize your `requirements.txt` and load those packages in your notebook's container. ###### Adding data You won't be able to host data on binder. If your data is large, you won't be able to save on GitHub either. You can use an external data repository, and get links to your data that you can refer to inside of your notebook. huggingface is a service to store and document large datasets. Here's an example: https://huggingface.co/imageomics/BGNN-trait-segmentation Note: python gets updated frequently, and it's something to be aware of. Be explicit about the versions of python and packages you use. FYI initiative to build reproducible containers: https://codeocean.com/

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully