# Introduction to Github
- Intro: richard
- what is RSH
- what is this workshop
- how to do hackmd
- In this workshop, we present basics of Github, for someone who knows a bit about git
- Git is a version control system.
## Progression of use cases and topics
GitHub is used for:
- things for yourself (central location, backup, sharing, organizing, easy moving to later levels)
- things for small groups (organizations, shared repos with or without PRs, etc)
- things for community (outside collaborators, discussion, PRs, etc)
Where are you? Where do you want to be?
## Quick survey
I have used git before:
- yes:
- no:
In the past I have already opened an issue on GitHub:
- yes:
- no:
In the past I have already creaded a pull request on GitHub (or merge request on GitLab):
- yes:
- no:
---
## What is git?
- **Version control**
- Track changes
- Inspect history
- Work together
- Basically a requirement for any kind of serious programming work
- But also useful for others.
- *This talk is not about git: we assume you use it already*
Demo:
- See a repository on command line
- Make a commit
- show commit in history
---
## What is GitHub?
* Commercial company
* ... hosting a **web repository, Github**
* Very good free services for open-source projects.
* vs Gitlab
* Gitlab is open-source equivalent
* Many universities have their own (e.g. version.aalto.fi, source.coderefinery.org)
* For the most part, usage is the same.
* Interfaces: take your pick
* Git command line
* Github desktop and apps
* Github web interface
* Cost
* Github ownership
Demo:
- Aalto University Gitlab
- Github
- Live example: https://github.com/ResearchSoftwareHour/researchsoftwarehour.github.io/
- Major project: https://github.com/numpy/numpy
## Basic features: pushing and pulling
Let's demo the basics of Github.
- You connect to other servers via **git remotes**, controlled with `git remote`.
- Once a remote is set, you `git push`, `git fetch`, and `git pull`
- It isn't our goal to go in detail about the technical implementation: you can read this later. (we'll talk about why instead)
Quick demo:
- New repository
- Create repository on Github
- Add remote
- Push to Github
What are basic properties of repositories:
- User or organization namespaces
- Public or private
- Single source of truth
- You always know where to look for things
---
## Git for small groups
- The next level is using Github among a research group, or something similarly small.
- To be a group, you need to work together. To work together, you need a place to do so.
- **Organizations**
- **User management**: share among a group
- **Issues**: Track things to do
- **Pull requests**: Code review
---
## User management
- Add multiple users to an organization
- Owners and members (and more)
- Bus/lottery factor: more than one person should have the possibility to add contributors and set permissions
- Somebody who wants to contribute changes does not have to be added as a collaborator: they can fork and send a PR without having push permissions
- Add a license to encourage external collaborators to contribute
- Consider using https://allcontributors.org/
- Organization with a bunch of users projects which are all push-to-default-branch is still a very useful setup
Questions
- ...
---
## Issues
- Issues properties
- Title, description
- comments
- Labels, notifications
-
- Boards
---
## Pull requests
- Pull requests (PR) are merge requests to accept a merge from one branch to another branch
- It can be useful to think of them as change proposals
- Review is an important part.
There are two options:
- Pull requests can merge within one repository (source and destination branches are in the same repository)
- Pull requests can merge across repositories (source branch is in a fork, destination branch is typically upstream default branch)
Demo (RB)/exercise: https://github.com/bast/pr-exercise
Good practices:
- For each new PR create a new branch (do not put unrelated changes on one branch, do not put unrelated changes into one PR)
- If the PR solves a known and documented issue, [autoclose the issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)
- Before working on massive changes, propose them first in an issue and discuss and agree before coding
- Write-protect the target branch (typically `main` or `master`) against pushes; only change the "main" branch through PRs
- All changes are reviewed by somebody else; nobody is "more equal" (motivation: learning and knowledge transfer)
- Draft PRs can be a good way to get feedback on half-finished work before investing a lot of time into creating a finished PR
Questions
- ...
## Community projects: the next level
- So you have your own group working well. What comes next? The community
- There is little cost to accepting contributions from others.
- Information overload
-
## Pull requests for the community
- The amazing cleverness of Git and pull requests is that it relieves work from the maintainer (and pushes it to the community), so that people can contribute.
## Licensing
- Importance of a license file
- Don't accept contributions until you get this clarified
## Long term: archiving and Zenodo
- Github is the de-facto source for open research software (for better or for worse)
- Can you cite it in a paper?
- Can you expect Github to be around in 20 years?
- More likely than your university's Gitlab or personal webpage. (But does your user still exist?)
- For permanent storage, connect to Zenodo and publish releases.
- Zenodo is a EU-funded repository of science.
https://www.youtube.com/watch?v=Atp-GmhS7gY
## Actions
- Automatic testing on changes: great for making sure things work!
- Automatic deployment.
Demo: ???
## What's next?
- CodeRefinery courses
- RSH
# END
## Folder structure
```console
README.md # often the first thing people see
LICENSE # it's not open source if there is no license
.github/ISSUE_TEMPLATE # issue template
.github/workflows # workflows/actions are defined here
CONTRIBUTING # contributing guidelines
some-folder/
another-folder/
src/
test/
.gitignore # lists paths/patterns to ignore
CITATION.cff # instructions on how to cite you
```
- A README file should include ([more examples](https://coderefinery.github.io/documentation/writing-readme-files/)):
- Markdown or RST format
- A descriptive project title
- Motivation (why the project exists)
- How to setup/install
- Copy-pastable quick start code example
- Recommended citation
- Folder structure will evolve and it is OK to change it over time.
- What files not to add:
- **generated files** (use `.gitignore` to protect against adding them accidentally)
- **huge files** (repository history should not exceed 100 MB)
- But there is LFS and git-annex for tracking big files using Git (which are stored outside of the Git repo)
- **sensitive files and secrets** (there are ways to store them on Git/GitHub)
Questions
- ...
---
## Branches
- Independent development line
- Branches allow us to work on several ideas at the same time without waiting for each other
- You want something to be evaluated/improved before making it "live"
- You need to make a modification for one project but you don't want it to affect others
- Often we call the main development line `main` or `master`, sometimes `develop`
- https://coderefinery.github.io/git-intro/06-branches/
- https://coderefinery.github.io/github-without-command-line/basics/
- For each new idea/topic/change create a new branch
- Branches are "cheap" in terms of storage (they are no more than labels/pointers to commits) so do not hesitate to create new branches
- But also single-branch repositories are not necessarily bad practice, especially if the code is maintained by a single person
Naming of branches:
- `radovan-someidea`: everybody knows who this belongs to, I can find my branches by grepping for my name
- `1234-someproblem`: here 1234 can refer to issue number 1234
Main development line:
- From Git's point of view, all branches are technically equivalent. There is nothing more special about `master` or `main`.
- But each project agrees to consider one branch as the "production"/"main" branch
- It can be useful to protect this branch against accidental deletion or accidental changes (only pull request changes are allowed)
Questions
- ...
---
## Making a community
- Discussions can be on GitHub *Issues*
- But GitHub also has *Discussions*
- It can be useful to reserve *Issues* for problems and proposals and *Discussions* for questions about understanding which do not immediately require code changes
- Many projects use a mailing list or Teams/Slack/chat/... in addition
Questions
- ...
---
## Templates
- Generating a repository from a template is not the same thing as forking a repository
- Templates are like a cookie cutter
- Generating from a template "flattens" the Git history
- Templates are good if you want to be able to create new repository from a *template* with a pre-defined folder/file structure to start with
- Templates are not useful if you plan to update the new repository with upstream changes or if you want to be able to eventually contribute changes back/upstream
- Comparison with forks: https://coderefinery.github.io/git-collaborative/01-remotes/
- How to create a template repostory: demo
Questions
- ...
---
## There is more
- GitHub Projects: overview board to sort issues into columns across repositories (kanban board)
- GitHub Actions/workflows: scripts to automate actions (build, spell checking, documentation generation, testing)
## See also
- Other resources:
- [Introduction to version control](https://coderefinery.github.io/git-intro/)
- [Collaborating and sharing using GitHub without command line](https://coderefinery.github.io/github-without-command-line/)
- [Collaborative distributed version control](https://coderefinery.github.io/git-collaborative/)
Questions
- ...