# Introduction to Github - Intro: richard - what is RSH - what is this workshop - how to do hackmd - In this workshop, we present basics of Github, for someone who knows a bit about git - Git is a version control system. ## Progression of use cases and topics GitHub is used for: - things for yourself (central location, backup, sharing, organizing, easy moving to later levels) - things for small groups (organizations, shared repos with or without PRs, etc) - things for community (outside collaborators, discussion, PRs, etc) Where are you? Where do you want to be? ## Quick survey I have used git before: - yes: - no: In the past I have already opened an issue on GitHub: - yes: - no: In the past I have already creaded a pull request on GitHub (or merge request on GitLab): - yes: - no: --- ## What is git? - **Version control** - Track changes - Inspect history - Work together - Basically a requirement for any kind of serious programming work - But also useful for others. - *This talk is not about git: we assume you use it already* Demo: - See a repository on command line - Make a commit - show commit in history --- ## What is GitHub? * Commercial company * ... hosting a **web repository, Github** * Very good free services for open-source projects. * vs Gitlab * Gitlab is open-source equivalent * Many universities have their own (e.g. version.aalto.fi, source.coderefinery.org) * For the most part, usage is the same. * Interfaces: take your pick * Git command line * Github desktop and apps * Github web interface * Cost * Github ownership Demo: - Aalto University Gitlab - Github - Live example: https://github.com/ResearchSoftwareHour/researchsoftwarehour.github.io/ - Major project: https://github.com/numpy/numpy ## Basic features: pushing and pulling Let's demo the basics of Github. - You connect to other servers via **git remotes**, controlled with `git remote`. - Once a remote is set, you `git push`, `git fetch`, and `git pull` - It isn't our goal to go in detail about the technical implementation: you can read this later. (we'll talk about why instead) Quick demo: - New repository - Create repository on Github - Add remote - Push to Github What are basic properties of repositories: - User or organization namespaces - Public or private - Single source of truth - You always know where to look for things --- ## Git for small groups - The next level is using Github among a research group, or something similarly small. - To be a group, you need to work together. To work together, you need a place to do so. - **Organizations** - **User management**: share among a group - **Issues**: Track things to do - **Pull requests**: Code review --- ## User management - Add multiple users to an organization - Owners and members (and more) - Bus/lottery factor: more than one person should have the possibility to add contributors and set permissions - Somebody who wants to contribute changes does not have to be added as a collaborator: they can fork and send a PR without having push permissions - Add a license to encourage external collaborators to contribute - Consider using https://allcontributors.org/ - Organization with a bunch of users projects which are all push-to-default-branch is still a very useful setup Questions - ... --- ## Issues - Issues properties - Title, description - comments - Labels, notifications - - Boards --- ## Pull requests - Pull requests (PR) are merge requests to accept a merge from one branch to another branch - It can be useful to think of them as change proposals - Review is an important part. There are two options: - Pull requests can merge within one repository (source and destination branches are in the same repository) - Pull requests can merge across repositories (source branch is in a fork, destination branch is typically upstream default branch) Demo (RB)/exercise: https://github.com/bast/pr-exercise Good practices: - For each new PR create a new branch (do not put unrelated changes on one branch, do not put unrelated changes into one PR) - If the PR solves a known and documented issue, [autoclose the issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue) - Before working on massive changes, propose them first in an issue and discuss and agree before coding - Write-protect the target branch (typically `main` or `master`) against pushes; only change the "main" branch through PRs - All changes are reviewed by somebody else; nobody is "more equal" (motivation: learning and knowledge transfer) - Draft PRs can be a good way to get feedback on half-finished work before investing a lot of time into creating a finished PR Questions - ... ## Community projects: the next level - So you have your own group working well. What comes next? The community - There is little cost to accepting contributions from others. - Information overload - ## Pull requests for the community - The amazing cleverness of Git and pull requests is that it relieves work from the maintainer (and pushes it to the community), so that people can contribute. ## Licensing - Importance of a license file - Don't accept contributions until you get this clarified ## Long term: archiving and Zenodo - Github is the de-facto source for open research software (for better or for worse) - Can you cite it in a paper? - Can you expect Github to be around in 20 years? - More likely than your university's Gitlab or personal webpage. (But does your user still exist?) - For permanent storage, connect to Zenodo and publish releases. - Zenodo is a EU-funded repository of science. https://www.youtube.com/watch?v=Atp-GmhS7gY ## Actions - Automatic testing on changes: great for making sure things work! - Automatic deployment. Demo: ??? ## What's next? - CodeRefinery courses - RSH # END ## Folder structure ```console README.md # often the first thing people see LICENSE # it's not open source if there is no license .github/ISSUE_TEMPLATE # issue template .github/workflows # workflows/actions are defined here CONTRIBUTING # contributing guidelines some-folder/ another-folder/ src/ test/ .gitignore # lists paths/patterns to ignore CITATION.cff # instructions on how to cite you ``` - A README file should include ([more examples](https://coderefinery.github.io/documentation/writing-readme-files/)): - Markdown or RST format - A descriptive project title - Motivation (why the project exists) - How to setup/install - Copy-pastable quick start code example - Recommended citation - Folder structure will evolve and it is OK to change it over time. - What files not to add: - **generated files** (use `.gitignore` to protect against adding them accidentally) - **huge files** (repository history should not exceed 100 MB) - But there is LFS and git-annex for tracking big files using Git (which are stored outside of the Git repo) - **sensitive files and secrets** (there are ways to store them on Git/GitHub) Questions - ... --- ## Branches - Independent development line - Branches allow us to work on several ideas at the same time without waiting for each other - You want something to be evaluated/improved before making it "live" - You need to make a modification for one project but you don't want it to affect others - Often we call the main development line `main` or `master`, sometimes `develop` - https://coderefinery.github.io/git-intro/06-branches/ - https://coderefinery.github.io/github-without-command-line/basics/ - For each new idea/topic/change create a new branch - Branches are "cheap" in terms of storage (they are no more than labels/pointers to commits) so do not hesitate to create new branches - But also single-branch repositories are not necessarily bad practice, especially if the code is maintained by a single person Naming of branches: - `radovan-someidea`: everybody knows who this belongs to, I can find my branches by grepping for my name - `1234-someproblem`: here 1234 can refer to issue number 1234 Main development line: - From Git's point of view, all branches are technically equivalent. There is nothing more special about `master` or `main`. - But each project agrees to consider one branch as the "production"/"main" branch - It can be useful to protect this branch against accidental deletion or accidental changes (only pull request changes are allowed) Questions - ... --- ## Making a community - Discussions can be on GitHub *Issues* - But GitHub also has *Discussions* - It can be useful to reserve *Issues* for problems and proposals and *Discussions* for questions about understanding which do not immediately require code changes - Many projects use a mailing list or Teams/Slack/chat/... in addition Questions - ... --- ## Templates - Generating a repository from a template is not the same thing as forking a repository - Templates are like a cookie cutter - Generating from a template "flattens" the Git history - Templates are good if you want to be able to create new repository from a *template* with a pre-defined folder/file structure to start with - Templates are not useful if you plan to update the new repository with upstream changes or if you want to be able to eventually contribute changes back/upstream - Comparison with forks: https://coderefinery.github.io/git-collaborative/01-remotes/ - How to create a template repostory: demo Questions - ... --- ## There is more - GitHub Projects: overview board to sort issues into columns across repositories (kanban board) - GitHub Actions/workflows: scripts to automate actions (build, spell checking, documentation generation, testing) ## See also - Other resources: - [Introduction to version control](https://coderefinery.github.io/git-intro/) - [Collaborating and sharing using GitHub without command line](https://coderefinery.github.io/github-without-command-line/) - [Collaborative distributed version control](https://coderefinery.github.io/git-collaborative/) Questions - ...