---
tags: teaching
---
# 2022-03-30 <br> DiMeN Reproducible Research Day 2: Python and GitHub
Welcome to today's hackpad! We'll use this editable space to include links for all resources for today and as a place for you to ask questions.
Alex Coleman, <a.coleman1@leeds.ac.uk>
### Visit this url - https://bit.ly/dimen-py2022
## Contents
- [Links for today](#Links-for-today)
- [Agenda](#Agenda)
- [Setup](#Setup)
- [Questions](#Questions)
- [Further reading](#Further-reading)
## Links for today
- [Hackpad](https://hackmd.io/@research-computing-leeds/2022-03-dimen-python)
- [Training material](https://arctraining.github.io/quant-python-03-2022/index.html)
- [Register for GitHub](https://github.com/join)
- [Repl.it](https://replit.com/)
- [Introduction to Git training material](https://arctraining.github.io/swd2_git/)
## Agenda
| Time | Agenda |
| -------- | ------------------------------------------ |
| 0900 | Arrival |
| 0915 | Getting everyone setup, intro to python |
| 0955 | Break ☕💨 |
| 1000 | Crash course python, starting with data |
| 1030 | Break ☕ |
| 1100 | Manipulating data |
| 1140 | Quick comfort break (depending on how in the flow we all are) |
| 1145 | Wrap up |
| 1200 | Lunch 🥪 |
| 1300 | Wrapping up python |
| 1350 | Break ☕ |
| 1410 | GitHub and tying everything together |
| 1450 | Questions and close |
| 1500 | End |
## Setup
### Pre course prep
To get ready for today's session you'll need to do the following steps:
1. [Sign up for a GitHub account](https://github.com/join), everything we're doing today will require a GitHub account, even the non-GitHub stuff. So sign up now using your academic email address
2. Navigate to https://github.com/ARCTraining/quant-python-03-2022-replit
3. Click the [`Fork`](https://github.com/ARCTraining/quant-python-03-2022-replit/fork) button

4. In your forked version of the repository click the `run on repl.it` button 
5. Click Continue with Github

6. Log in with your new GitHub account and provide permission to repl.it to access your GitHub repositories
7. This will then clone the repository on repl.it and you'll see a screen like this and are ready to go!

### Configuring git on repl.it
When we set up the version control tool git for the very first time on a device (laptop, computer, virtual machine) we need to do some one-off configuration.
Because we're using repl.it which is running small instances on a virtual machine on the cloud we'll need to do these steps here too.
In the right-hand pane select the `Shell tab` this will provide you with a [bash shell](https://en.wikipedia.org/wiki/Bash_(Unix_shell)), you can tell because the last character before the cursor is a dollar ($).
Here are the global config settings I would set in my repl.it
Run these commands yourself changing the entries in double quotation marks to your name and the email address you used to sign up for GitHub.
```bash=
$ git config --global user.name "Alex Coleman"
$ git config --global user.email "a.coleman1@leeds.ac.uk"
$ git remote rm origin
$ git remote add origin https://github.com/<YOUR_GITHUB_USERNAME>/quant-python-03-2022-replit
```
## Questions
Click the edit button (pencil symbol) on the top right and use the dark background edit mode to write your own question below.
How long is this repl.it instance thing up for? :)
## Further reading
### Copys and references
**Summary**: _Be careful when making changes to subsets of data. To avoid making changes to the original data, make a copy of it._
In the workshop, we looked at the following example:
```python=
>>> bar = 10
>>> bar
10
>>> barbar = bar
>>> bar = 15
>>> barbar
10
>>> bar
15
```
Here, `barbar` is a _reference_ to `bar`, not a _copy_ of it. A reference is just a label that points back to the original data.
We can check whether two variables are references of eachother using the `is` keyword. If they are references this will return `True` (i.e., they are the same object).
```python=
>>> bar = 10
>>> barbar = bar
>>> bar is barbar # check if references
True
>>> bar = 15
>>> bar is barbar
False
```
Note, this is different to the `==` operator, which tests if the variables have equal values but are not the same object.
_This behaviour changes based on the type of the data._ This may have caused confusion. For example:
```python=
>>> # comparing integers
>>> x = 2
>>> y = 2
>>> x == y # two integer variables have *equal values*
True
>>> x is y # two integer variables are the *same object*
True
>>> # comparing lists
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a == b # two list variables have equal values
True
>>> a is b # two lists are *not* the same object
False
```
Similarly, in Pandas:
```python=
>>> # we have our initial dataframe
>>> surveys_df.head()
record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN
1 2 7 16 1977 3 NL M 33.0 NaN
2 3 7 16 1977 2 DM F 37.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
>>> # we create a new dataframe using our existing dataframe
>>> ref_surveys_df = surveys_df
>>> # we change the data in our ref_surveys_df
>>> ref_surveys_df[0:3] = 0
>>> # we have a quick look at this dataframe
>>> ref_surveys_df.head()
record_id month day year plot_id species_id sex hindfoot_length weight
0 0 0 0 0 0 0 0 0.0 0.0
1 0 0 0 0 0 0 0 0.0 0.0
2 0 0 0 0 0 0 0 0.0 0.0
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
>>> # we look back at our original dataframe and find it's changed too! arghhh
>>> surveys_df.head()
record_id month day year plot_id species_id sex hindfoot_length weight
0 0 0 0 0 0 0 0 0.0 0.0
1 0 0 0 0 0 0 0 0.0 0.0
2 0 0 0 0 0 0 0 0.0 0.0
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
>>> # we can check that they were actually references of each other
>>> ref_surveys_df is surveys_df
True
>>> # to create a true copy with pandas we have to use the .copy() function
>>> copy_surveys_df = surveys_df.copy()
>>> copy_surveys_df is surveys_df
False
>>> # now we can change copy_surveys_df without affecting our original data (surveys_df)
```
For more information, see when [Pandas returns a view or a copy](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-view-versus-copy), the [Python docs](https://docs.python.org/3/library/copy.html), and [Real Python](https://realpython.com/copying-python-objects/).
### Virtual environments and conda
There is a nice introduction to concepts around using virtual environments in this [Carpentries Incubator workshop](https://carpentries-incubator.github.io/python-intermediate-development/12-virtual-environments/index.html). It nicely covers what they are, how you use them, how you share them.
Conda is another (very popular) tool for doing the same thing are virtual environments and again there is a nice guide from the [Carpentries](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/).
I would recommend reading both of these are it's worth knowing about both but if you just want to really crack on i'd start with the conda one as that is probably most commonly used in your domain.
### Objects in python
For the very brave you can read more about the concept of objects in Python and it's associated programming paradigm object-orientated programming (OOP) on this [Real Python tutorial](https://realpython.com/python3-object-oriented-programming/). This answer on [StackOverFlow](https://stackoverflow.com/questions/56310092/what-is-an-object-in-python) is also good.
But crucially this isn't something as a beginner you should worry about too much but it is something worth being aware of as you expand your python experience.