**Workshop Details**
Dates: September 6th - 13th, 2022
Time: 9am - 12pm
**Workshop Agenda:**
https://ucsdlib.github.io/2022-09-06-carpentries-uc/
**Workshop Lesson:**
http://swcarpentry.github.io/python-novice-gapminder/
## Day 1 - 3: Introduction to Python
**Software Installation:**
Anaconda
https://www.anaconda.com/download/
* download latest version - 64-bit installer for Windows 10
* This application is used to install and run Jupyter Notebooks
* Google Collab: https://colab.research.google.com (for use if there are problems during the workshop)
**Lesson Data (download)**
* <a href="https://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip">gapminder data</a>
* <a href="https://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip">inflammation data</a>
## NOTES:
A copy of the instructor live session notes will be made available to participants upon request at the end of the workshop.
Jupyterlab will be used for the lessons
[m] Markdown cell = notes
[#]also works in code cell for notes
[b] = add cell below [a] is above
[r]Raw cells cannot have text edits
(for Python lessons)
https://www.markdownguide.org/getting-started/
https://www.markdownguide.org/basic-syntax/
## Workshop Day 1
### First name and Last Name/Organization/Dept./Email |
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | --------------- |
| (example) Jane Doe | UCSD | IT | jdoe1@ucsd.edu |
| Kat Koziar (Helper) | UCR | Library | katherine.koziar@ucr.edu |
| Jacob sola | UCR | Chemistry/Biomedical| jsola032@ucr.edu |
| Douglas Zhang|UCSD | Chemistry/Biochemistry |doz023@ucsd.edu |
|Jacqueline Giacoman |UC Merced|Political Science| jgiacoman@ucmerced.edu|
|Jose Hernandez |UCB |Library |jose1991@berkeley.edu |
|John Thompson | UC Merced | Molecular & Cellular Biology | jthompson44@ucmerced.edu |
|Derek Devnich|UC Merced | | |
|Sam Erickson |UC Merced |Physics |serickson3@ucmerced.edu |
Dilawer Ali | UC Merced | Mechanical Engineering | dali4@ucmerced.edu
| Igor Aprelev |UCSD | Mathematics and Economics | iaprelev@ucsd.edu |
|Benjamin Nauman |UCLA |Geography |bnauman@ucla.edu |
|Mohit Saraswat| UC Merced | Chemistry | msaraswat@ucmerced.edu |
|Jacob Ross |UCSD |Anesthesiology |jaross@ucsd.edu |
|Jay Colond |UCM |Sociology |jcolond@ucmerced.edu |
|Zhaoning (Johnny) Wang |UCSD|CMM |zhw063@health.ucsd.edu|
| Lillie Pennington | UC Merced | Life and Environmental Sciences | lpennington@ucmerced.edu |
|Christian Henry|UC Berkeley|Integrative Biology|chrishenry@berkeley.edu|
|Belina Chong|UCLA |Ecology and Evolutionary Biology|moonmoon394@ucla.edu |
|Josiah Piceno|UCM|MBSE|jpiceno3@ucmerced.edu|
|Jun Tan |UCSD |Economics |j4tn@ucsd.edu
|Jon Dean |UCSD | Anesthesiology |j1dean@health.ucsd.edu|
|Tahirah Williams |UCM |QSB |twilliams76@ucmerced.edu |
|Liam de Villa Bourke| UCLA | Institute of the Environment and Sustainability | liamdevilla@g.ucla.edu |
|Rukmini Ravi |UCSD |San Diego Supercomputer Center |ruravi@ucsd.edu |
|Amber Heidbrink |UCSD |Cell and Developmental Biology | aheidbrink@ucsd.edu |
|Haley Potts | UCSD |Math & Economics |hpotts@ucsd.edu |
| isabella schaedle | UCSD | MMMMMMMM | |
| | | | |
|Apisit Kaewsanit | UCSF | Epidemiology and Biostatistics | apisit.kaewsanit@ucsf.edu |
| Ivan Felix Rios | UCSD | Mathemathics & Economics | ifelixrios@ucsd.edu |
| Christian Corrales | UCLA | Neurology | ccorrales@mednet.ucla.edu |
| Michael Woller | UCLA | Psychology | michaelwoller@g.ucla.edu |
|Stella Yuan |UCLA |Ecology and Evolutionary Biology |scy8@g.ucla.edu |
|Jonathan Le | UCR | Mathematics | jle173@ucr.edu |
| Laika Aguinaldo | UCSD | Psychiatry | laaguinaldo@ucsd.edu |
| Chris Gray | UCR | Data Science | cgray024@ucr.edu |
|Ana Carolina Dantas Machado | UCSD | Medicine | adantasmachado@ucsd.edu |
|Jason Ngo |UC Merced |Bioengineering | jngo42@ucmerced.edu |
|Yibing Zhang |UC Merced |Bioengineering |yzhang291@ucmerced.edu |
|Ashwin Thomas | UC Merced | Environmental Systems | athomas59@ucmerced.edu|
|Eric Hyde| UCSD |Epidemiology|ehyde@health.ucsd.edu |
| Bineh Ndefru |UCLA|Materials Science| bndefru@ucla.edu|
| | | | |
| Vishakha Malhotra |UCSF | Biostatistics and Epidemiology | | | |vishakha.malhotra@ucsf.edu
| Bruce Hamilton | UCSD |School of Medicine | bah@ucsd.edu |
|Kazuma Nagatsuka| UCSD| Robotics(Mechanical Engineering) | kngatsuka@ucsd.edu |
| Caitlin Tribelhorn | UCSD | Pediatrics | ctribelh@ucsd.edu |
| Vikram Jambulapati | UCSD | Economics | vjambula@ucsd.edu |
| Simran Kanal |UCSF |Biostatistics and Epidemiology | simran.kanal@ucsf.edu |
| Daryl Han | UC Irvine | Student Center and Event Services | ddhan@uci.edu |
| Charles Faulhaber | UC Berkeley | Bancroft Library / Dept. of Spanish | |
|Mario Cuaya | UCR |Computer Science|mcuay001@ucr.edu |
|Waleed Rajabally | UC Merced | Sociology |wrajabally@ucmerced.edu |
| Junxiao Gao |UCSF |Biostatistics and Epidemiology |Junxiao.Gao@ucsf.edu |
| | | | || | | | |
| Jay Chi | UCSB | ETS | jaychi@ucsb.edu |
| | | | |
|Vishakha Malhotra |UCSF |Biostatistics and Epidemiology | vishakha.malhotra@ucsf.edu |
## Day 1 Questions:
Please enter any questions not answered during live session here:
1.
## Day 1 Live Class Notes:
Download link: https://www.anaconda.com/products/distribution
Working in Anaconda JupyterLab
GUI (middle-man, colloquially pronounced as "gooey") vs command-line
**Today's workshop is strictly in JupyterLab GUI**
**Computer programming languages** - there are a lot of them, and what they do is similar, syntax is also similar between different languages (although, each is specific). Able to learn the basics and apply them to different langauges.
Your favorite search engine is a good resource when you're looking for answers to your programming questions (kat's note: I <3 Stack Exchange)
**working directory** - in JupyterLab, working directory is shown on the left sidebar. Left sidebar also shows tabs, such as file browse (where you can select your working directory, create new files/folders), a list of what terminals are running, etc. The left sidebar can also be collapsed or expanded. Running anaconda JupyterLab is local to your computer, so when you're using a public computer, any files are saved on that public computer
**new file** - Day1_Python_LiveNotes.ipynb (to rename, right click on file to bring up submenu)
**Interface** - menu bar at top contains more options than the tabs in the left sidebar quicklinks
**Command and Edit modes** - press B will create a new cell below current cell
- code cell will allow you to enter code
- markdown cell doesn't run code, it's only notes (formatted in markdown) You can change a cell into a markdown cell by pressing m - switch between code and markdown cells by pressing the m or y keys. <kbd>m</kbd> <kbd>y</kbd>
- print('Hello') will show 'Hello' right below the cell if it's executed in a code cell
- a-key creates a cell above
- ctrl-enter will run the cell, either execute the command in a code cell or render the markdown in a markdown cell
- menu -> Kernel -> restart and clear all output will clear all output and saved variables, but keep the text *in* the cells.
- Markdown cells are stylized text
- # Hello There
- *Bye*
- raw (code) cells are plain text, and executable codes the octothorp, pound sign, number sign, hash: # is used for comments in code
- comments are used to explain why/what your code is doing - comments are a love note to your future self
- to create a list in markdown, bullets are created using a - or * with a space. different levels are created using levels.
- example of level 2
- level 3
**Numbered lists**
1. level1
2. level 2, also requires tabs
A tool like HackMD lets you practice markdown.
**Bold and italics**
- bold is surrounded by two **astrisks**
- italics is surrounded by *single astrisk* or _underscores_
In JupyterLab markdown cells, you can combine some html elements, such as \<br>
backslash \\ before the less-than-symbol will escape the character so it isn't read as html \\\<br>
**Mixed list**
1. level 1
* Level 2
1. level 1
* Level 2
1. level 1
* Level 2
**Headings** use # to create different sizes
# largest
## one smaller
### smaller
#### etc
##### even smaller
[Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet)
# Save and save often
- Always shut down your kernel (menu -> Kernel -> Shut Down Kernel) when you're finished
- this makes sure your file/project isn't continuing to use resources when not intended - especially useful when you're using a hpcc environment.
[Challenge #1](http://swcarpentry.github.io/python-novice-gapminder/01-run-quit/index.html#creating-lists-in-markdown)
### Lesson 2: Variables and Assignments
`age = 42`
`first_name = 'Ahmed'`
- variable_name = value
- computer only recognizes the values assigned to the variable after the code cell is executed
- variable name rules
- can only contain letters, digits, underscores (a dash is a minus sign in code!)
- use underscore or camelCase to help human readability
- `thisisaverylongnamethatishardforahumantoread = "Jimmy"`
- `this_is_more_readable = "Jimmy"`
- `thisIsCamelCase = "jimmy"`
- variable names cannot start with a number
- use self-describing short variable names (`x` is not self-describing, `age` or `weight` are self-describing)
- variable names are CaseSensitive
- variables that start with an underscore have a special meaning (`_dont_use_until_you_understand_what_it_means`)
- will get syntax error if the variable name doesn't follow the rules, such as `3age` (starts with a number) or `read@one` (uses any symbol other than the underscore _)
**Built in functions**
- `print()` prints things as text
- `print(first_name, 'is', age, 'years old')` will print `Ahmed is 42 years old`
- built-in functions are native to python, and are functions that are commonly used by programers
- `print()` will automatically add single spaces in the current version of Python.
- `print(argument1, argument2, argument3, argument4)`
- functions are self-contained - will take in arguments and provide output.
- functions allow you to easily reuse code
- not all functions require arguments. some functions require a certain number of arguments.
**Variables**
- must be created before they are used.
- `print(myval)` will give an error if `myval` isn't already created with a value
This will throw an error because `last_name` does not have an assigned value
```python
print(last_name)
last_name = "Smith"
```
This will not throw an error
```python
last_name = "Smith"
print(last_name)
```
**Challenge #2**
Assign the variable named **color1** to the value **red** and the variable named **color2** to the value **blue**. Then print `red is not blue` using the variable names as input (or arguments)
```
color1 = 'red'
color2 = 'blue'
print(color1, 'is not', color2)
print(color1, 'is', 'not', color2)
```
**Blocks of text**
- you can surround a block of text with triple quotes, like so: """ My very long block of text """
**variables used in calculations**
- need to be a certain datatype for calculations - num type, integer or float
- `age = age + 3`
- `3 + 5 * 4` calculates according to math rules (order of operations), not read left to right
- parentheses/brackets, exponents/radicals, muliplication/division, addition/subtraction
- `3 + 5 * 4` = `23`
- `(3 + 5) * 4` = `32`
**Challenge #3**
Write the code for for the following: number1 is 22, number2 is 5, and number3 is 100. Multiple number1 by number3 then divid by number2. The answer calculation answer should be number4. Finally, output 'The answer is number4' - with the value displaying rather than the variable.
**Built-in functions**
- `index()` gives you a single character from a string
- in python, indices start with 0 (zero)
```
atom_name = 'helium'
print(atom_name[0])
```
output is `h`
- `index()` uses the variable name, then square brackets around the number of the index you want to obtain
datatype strings are text surrounded by single or double-quotes (pair single-quotes with single-quotes, don't interchange 'like this")
```
id_number = 2587464
print(id_number[2])
```
will result in error because `id_number` is an integer, and not a string
**list**
```
my_list = ['apple', 'pear', 'peach]'
print(my_list[1])
```
output is `pear`
**slices**
- slice is a substring or subset
- slice is `variable[start position: stop position(not including)]`
```
# string example
atom_name = 'sodium'
print(atom_name[0:3])
```
output is `sod`
```
**# list example
many_atoms = ['oxygen', 'carbon', 'nitrogen', 'neon', 'iron', 'zinc']
print(many_atoms[1:4])**
```
output will be `['carbon', 'nitrogen', 'neon']` *(notice how it outputs in a list format!)*
**how long are things?**
- function is `len()`
- finds the length of a string or list
- lets you know how long a string is, or how many elements are in a list
```
#string example
print(len('helium'))
```
output is `6` *(counts number of characters)*
```python
# list example
my_list2 = ['a', '1', '43', 'dream', 'please']
print(len(my_list2))
```
output is `5` *(counts number of elements in list)*
**Challenge #4**
1. what does `thing[:]` (just a colon) do?
2. What does `thing[number:some-negative-number` do?
3. What does the following program print?
```python
atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])
```
**Solution #4**
1. returns everything
2. returns a slice from `number` to the the negative count from the end of the variable
```python
#example
atom_name = 'carbon'
print(atom_name[1:-4])
```
output is `ca`
3. output is `atom_name[1:3] is: ar`
* *(remember, the number that is the stop position in the slice isn't included.)*
## Data types & type conversion
- all data that python reads is associated with a data type. Types we've covered so far are string, integer, floats, which are the three commonly used data types.
- Type conversion means you're converting data from one type to another
- integers : whole numbers
- type conversion use `int()`
- floats : also called floating points, they are decimal (real) numbers
- type conversion use `float()`
- strings : sequence of characters, written inside quotes
- type conversion use `float()`
- to identify the type of data, use `type()`
`type(52)` will output `int`
`print(type(52))` will output `<class 'int'>`
```python
fitness = 'average'
print(type(fitness))
```
output is `<class 'str'>`
`print(type(hair))` will throw an error, because Python is reading `hair` as a variable name, which isn't defined.
```python
print(type(3.4))
```
output is `<class 'float'>`
```python
print (5-2)
```
will output `2`
```python
print ('hello'-'h')
```
will throw an error because you can't subtract strings
You can use '+' and '*' on integers, floats, and strings, but operates differently on strings
```python
print (4+5)
```
output is `9`
```python
print ("Ahmed"+"Walch")
```
output is `AhmedWalch`
```python
print ('Ahmed'*10)
```
output is `AhmedAhmedAhmedAhmedAhmedAhmedAhmedAhmedAhmedAhmed`
- Cannot mix strings with integers/floats for mathematical purposes
```python
print (1 + '2')
```
will throw an error.
however,
```python
print (1 + int('2'))
```
will output `3` because `'2'` is type cast as an integer, allowing math operations.
```python
print (str(1) + '2')
```
will output `12` *(which is actually a string, not a number!)*
```python
print ('Gene'+str(23455685))
```
will output `Gene23455685`, which allows easy labels!
## Variables only change values once the value is (re-)assigned
if you need to keep an original value of a variable, create a new variable name, otherwise you're overwriting the original value.
LIVE LESSON NOTES: [https://drive.google.com/file/d/1TSm1bA55RwQu5-iqdnBNRU47U3os9x86/view?usp=sharing](https://drive.google.com/file/d/1TSm1bA55RwQu5-iqdnBNRU47U3os9x86/view?usp=sharing)
### End Day 1
## Workshop Day 2
### First name and Last Name/Organization/Dept./Email
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | ----------------- |
| Geno Sanchez (helper) | UCLA |Library|genosanchez@library.ucla.edu|
|Amber Heidbrink |UCSD |Cell and Developmental Biology |aheidbrink@ucsd.edu |
| Kat Koziar|UCR|Library |katherine.koziar@ucr.edu || | | | |
|Yibing Zhang |UCM |Bioengineering |yzhang291@ucmerced.edu |
|Douglas Zhang|UCSD|Chemistry and Biochemistry |doz023@ucsd.edu |
| Kazuma Nagatsuka | UCSD | Robotics(Mechanical Engineering) | knagatsuka@ucsd.edu |
|Jay Colond |UCM |Sociology | jcolond@ucmerced.edu |
|Belina Chong |UCLA |Ecology and Evolutionary Biology |moonmoon394@ucla.edu |
|Jonathan Le |UCR |Mathematics |jle173@ucr.edu |
|Caitlin Tribelhorn |UCSD | Pediatrics | ctribelh@ucsd.edu |
|Igor Aprelev |UCSD |Mathematics and Econonmics|iaprelev@ucsd.edu |
| Sam Erickson | UC Merced |Physics|serickson3@ucmerced.edu|
| Jay Chi | UCSB | ETS | jaychi@ucsb.edu |
| | | | |
| Apisit Kaewsanit | UCSF | Epidemiology and Biostatistics | apisit.kaewsanit@ucsf.edu |
| Benjamin Nauman | UCLA | Geography | bnauman@ucla.edu |
| Suzanne Paulson |UCLA |AOS | paulson@atmos.ucla.edu |
| Liam de Villa Bourke | UCLA | IOES | liamdevilla@g.ucla.edu |
|Mario Cuaya | UCR |Computer Science |mcuay001@ucr.edu |
|Josiah Piceno |UCM |MBSE|jpiceno3@ucmerced.edu|
| John Thompson | UC Merced | Cell & Molecular Biology | jthompson44@ucmerced.edu |
|Bineh Ndefru | UCLA |Material Science | bndefru@ucla.edu |
|Zhiyuan Yao |UCLA |Data Science Center|zyao@ucla.edu | | | |
| Tahirah Williams | UCM | QSB | twilliams76@ucmerced.edu |
| Haley Potts | UCSD | Math & Econ | hpotts@ucsd.edu |
|Zhaoning (Johnny) Wang | UCSD |CMM|zhw063@health.ucsd.edu|
| Daryl Han | UC Irvine | Student Center and Event Services | ddhan@uci.edu |
|Simran Kanal | UCSF | Epidemiology and Biostatistics | simran.kanal@ucsf.edu |
| Jon Dean | UCSD | Anesthesiology | j1dean@health.ucsd.edu |
| Junxiao Gao | UCSF | Biostatistics and Epidemiology |Junxiao.Gao@ucsf.edu |
|Stella Yuan |UCLA |Ecology and Evolutionary Biology |scy8@g.ucla.edu |
| Waleed Rajabally | UC Merced | Sociology |wrajabally@ucmerced.edu
|
|Jun Tan |UCSD |Economics |j4tan@ucsd.edu |
|Christian Henry|UC Berkeley|Integrative Biology|chrishenry@berkeley.edu|
|Jacob Ross |UCSD |Anesthesiology |jaross@ucsd.edu |
| Christopher Gray | UCR | Computer Science | cgray024@ucr.edu |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
## Day 2 Questions:
Please enter any questions not answered during live session here:
1.
## Day 2 Live Class Notes:
**Gapminder data download:** http://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip
## Lesson 5 Libraries
### Most of the power of a programming language is in its libraries.
A **library** is a collection of files (called modules) that contains functions for use by other programs.
* May also contain data values
* Pandas - widely used library often used in the science world
* Many are open source
* The Python standard library is an extensive suite of modules that comes with Python itself.
* https://docs.python.org/3/library/
### A program must import a library module before using it.
Use `import` to load a library module into a program’s memory.
```python
import math
print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))
```
`pi is 3.141592653589793
cos(pi) is -1.0 `
### Have to refer to each item with the module’s name.
## Use help to learn about the contents of a library module.
```python
help(math)
Help on module math:
NAME
math
MODULE REFERENCE
http://docs.python.org/3/library/math
...
```
## Import specific items from a library module to shorten programs.
```python
from math import cos, pi
print('cos(pi) is', cos(pi))
```
`cos(pi) is -1.0`
## Create an alias for a library module when importing it to shorten programs.
```python
import math as m
print('cos(pi) is', m.cos(m.pi))
```
`cos(pi) is -1.0`
* Use import ... as ... to give a library a short alias while importing it.
* Then refer to items in the library using that shortened name
```python
import matplotlib as mpl
```
### Challenge
1. Fill in the blanks so that the program below prints 90.0.
2. Rewrite the program so that it uses `import` without `as`.
3. Which form do you find easier to read?
```python
import math as m
angle = ____.degrees(____.pi / 2)
print(____)
```
Solution:
```python=
import math as m
#1
angle = m.degrees(m.pi / 2)
print(angle)
#2
import math
angle = math.degrees(math.pi / 2)
print(angle)
```
```python
90.0
```
## Lesson 6: Writing Functions
### Define a function using def with a name, parameters, and a block of code.
```python
# you need to declare a new function with the keyword 'def'.
# you need to include a 'name()'.
def say_hello():
print("hello!")
```
* Begin the definition of a new function with `def`
* Followed by the name of the function.
* Must obey the same rules as variable names
* You need to use a letter or underscore or number, but you can not start with a number.
* Then parameters in parentheses
* Empty parenteses if the function doesn't take any input
* Then a colon is used
* Next line of code is indented
* Some functions require an argument to be passed for it to be execute and others do not.
```python
# After defining a function, you must 'call' a function to execute it.
say_hello()
```
`hello!
`
```python
# Let's make a function that prints a date as an example of a function that takes an argument.
def print_date(year, month, day): # so the input is 'arg1', arg2,arg3' being required for the function
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
print_date(2022, 1, 2)
```
`2022/1/2
`
```python
print_date(month = 1, year = 2019, day = 23)
```
`2019/1/23`
## Defining a function using the `return` call.
```python
def average(values):
if len(values) == 0:
return None
return sum(values) / len(values)
avg = average([1,3,4])
print(avg)
emptyAvg = avg([])
print(emptyAvg)
```
`2.6666666666666665`
`None`
```python
#
result = print_date(1871, 3, 19)
print('result of print_date', result)
```
`1871/3/19
result of print_date` `None`
## Challenge
### What is wrong with this example?
```python
#Example
result = print_time(11,37,59)
def print_time(hour, minute, second):
time_string = str(hour) + ':' + str(minute)+ ':' + str(second)
print(time_string)
```
```python
# After fix:
result = print_time(11, 37, 59)
print('result of call is:', result)
```
` 11:37:59
result of call is: None`
# Reading tabular data into data frames
```python
import os
#Get our current working directory
print(os.getcwd())
#List the contents of this directory
print(os.listdir())
```
```python
import pandas as pd
data = pd.read_csv("gapminder_gdp_oceania.csv")
```
```python
#Reading data from a subfolder
#data = pd.read_csv("subfolder/gapminder_gdp_oceania.csv")
```
```python
print(data)
```
```python
country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 \
0 Australia 10039.59564 10949.64959 12217.22686
1 New Zealand 10556.57566 12247.39532 13175.67800
gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 \
0 14526.12465 16788.62948 18334.19751 19477.00928
1 14463.91893 16046.03728 16233.71770 17632.41040
gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 \
0 21888.88903 23424.76683 26997.93657 30687.75473
1 19007.19129 18363.32494 21050.41377 23189.80135
gdpPercap_2007
0 34435.36744
1 25185.00911
```
```python
data
country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
0 Australia 10039.59564 10949.64959 12217.22686 14526.12465 16788.62948 18334.19751 19477.00928 21888.88903 23424.76683 26997.93657 30687.75473 34435.36744
1 New Zealand 10556.57566 12247.39532 13175.67800 14463.91893 16046.03728 16233.71770 17632.41040 19007.19129 18363.32494 21050.41377 23189.80135 25185.00911
```
```python
# lets identify our rows by country not index number
data = pd.read_csv("gapminder_gdp_oceania.csv", index_col = "country")
```
```python
gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
country
Australia 10039.59564 10949.64959 12217.22686 14526.12465 16788.62948 18334.19751 19477.00928 21888.88903 23424.76683 26997.93657 30687.75473 34435.36744
New Zealand 10556.57566 12247.39532 13175.67800 14463.91893 16046.03728 16233.71770 17632.41040 19007.19129 18363.32494 21050.41377 23189.80135 25185.00911
```
```python
data.info()
```
```python
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, Australia to New Zealand
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gdpPercap_1952 2 non-null float64
1 gdpPercap_1957 2 non-null float64
2 gdpPercap_1962 2 non-null float64
3 gdpPercap_1967 2 non-null float64
4 gdpPercap_1972 2 non-null float64
5 gdpPercap_1977 2 non-null float64
6 gdpPercap_1982 2 non-null float64
7 gdpPercap_1987 2 non-null float64
8 gdpPercap_1992 2 non-null float64
9 gdpPercap_1997 2 non-null float64
10 gdpPercap_2002 2 non-null float64
11 gdpPercap_2007 2 non-null float64
dtypes: float64(12)
memory usage: 208.0+ bytes
```
### stat info of your data
```python
data.describe()
```
```python
gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
count 2.000000 2.000000 2.000000 2.000000 2.00000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000
mean 10298.085650 11598.522455 12696.452430 14495.021790 16417.33338 17283.957605 18554.709840 20448.040160 20894.045885 24024.175170 26938.778040 29810.188275
std 365.560078 917.644806 677.727301 43.986086 525.09198 1485.263517 1304.328377 2037.668013 3578.979883 4205.533703 5301.853680 6540.991104
min 10039.595640 10949.649590 12217.226860 14463.918930 16046.03728 16233.717700 17632.410400 19007.191290 18363.324940 21050.413770 23189.801350 25185.009110
25% 10168.840645 11274.086022 12456.839645 14479.470360 16231.68533 16758.837652 18093.560120 19727.615725 19628.685412 22537.294470 25064.289695 27497.598692
50% 10298.085650 11598.522455 12696.452430 14495.021790 16417.33338 17283.957605 18554.709840 20448.040160 20894.045885 24024.175170 26938.778040 29810.188275
75% 10427.330655 11922.958888 12936.065215 14510.573220 16602.98143 17809.077558 19015.859560 21168.464595 22159.406358 25511.055870 28813.266385 32122.777858
max 10556.575660 12247.395320 13175.678000 14526.124650 16788.62948 18334.197510 19477.009280 21888.889030 23424.766830 26997.936570 30687.754730 34435.367440
```
### Print column names
```python
data.columns
# or
print(data.columns)
```
```python
Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
dtype='object')
```
### Dataframes
Dataframes are a collection of columns. Within a column it has to be the same data type (e.g. float, int, str)
## Challenge
1. Read the data in `gapminder_gdp_americas.csv` into a variable called `americas` and display its summary statistics.
2. After reading the data for the Americas, use `help(americas.head)` and `help(americas.head)` to find out what `DataFrame.head` and `DataFrame.tail` do.
3. How can you display the first three rows of this data?
solution:
```python
americas = pd.read_csv("data/gapminder_gdp_americas.csv", index_col = "country")
print(americas.head(3))
print(americas.describe())
```
```python
continent gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 \
country
Argentina Americas 5911.315053 6856.856212 7133.166023
Bolivia Americas 2677.326347 2127.686326 2180.972546
Brazil Americas 2108.944355 2487.365989 3336.585802
gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 \
country
Argentina 8052.953021 9443.038526 10079.026740 8997.897412
Bolivia 2586.886053 2980.331339 3548.097832 3156.510452
Brazil 3429.864357 4985.711467 6660.118654 7030.835878
gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 \
country
Argentina 9139.671389 9308.418710 10967.281950 8797.640716
Bolivia 2753.691490 2961.699694 3326.143191 3413.262690
Brazil 7807.095818 6950.283021 7957.980824 8131.212843
gdpPercap_2007
country
Argentina 12779.379640
Bolivia 3822.137084
Brazil 9065.800825
gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 \
count 25.000000 25.000000 25.000000 25.000000
mean 4079.062552 4616.043733 4901.541870 5668.253496
std 3001.727522 3312.381083 3421.740569 4160.885560
min 1397.717137 1544.402995 1662.137359 1452.057666
25% 2428.237769 2487.365989 2750.364446 3242.531147
50% 3048.302900 3780.546651 4086.114078 4643.393534
75% 3939.978789 4756.525781 5180.755910 5788.093330
max 13990.482080 14847.127120 16173.145860 19530.365570
gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 \
count 25.000000 25.000000 25.000000 25.000000
mean 6491.334139 7352.007126 7506.737088 7793.400261
std 4754.404329 5355.602518 5530.490471 6665.039509
min 1654.456946 1874.298931 2011.159549 1823.015995
25% 4031.408271 4756.763836 4258.503604 4140.442097
50% 5305.445256 6281.290855 6434.501797 6360.943444
75% 6809.406690 7674.929108 8997.897412 7807.095818
max 21806.035940 24072.632130 25009.559140 29884.350410
gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
count 25.000000 25.000000 25.000000 25.000000
mean 8044.934406 8889.300863 9287.677107 11003.031625
std 7047.089191 7874.225145 8895.817785 9713.209302
min 1456.309517 1341.726931 1270.364932 1201.637154
25% 4439.450840 4684.313807 4858.347495 5728.353514
50% 6618.743050 7113.692252 6994.774861 8948.102923
75% 8137.004775 9767.297530 8797.640716 11977.574960
max 32003.932240 35767.433030 39097.099550 42951.653090
```
# Getting data out of your data frame
```python
# get a column
data = pd.read_csv("gapminder_gdp_europe.csv", index_col = "country")
data.columns
```
```python
Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
dtype='object')
```
```python
col1 = data["gdpPercap_1957"] # getting data by columnn label
print(col1)
country
Albania 1942.284244
Austria 8842.598030
Belgium 9714.960623
Bosnia and Herzegovina 1353.989176
Bulgaria 3008.670727
Croatia 4338.231617
Czech Republic 8256.343918
Denmark 11099.659350
Finland 7545.415386
France 8662.834898
Germany 10187.826650
Greece 4916.299889
Hungary 6040.180011
Iceland 9244.001412
Ireland 5599.077872
Italy 6248.656232
Montenegro 3682.259903
Netherlands 11276.193440
Norway 11653.973040
Poland 4734.253019
Portugal 3774.571743
Romania 3943.370225
Serbia 4981.090891
Slovak Republic 6093.262980
Slovenia 5862.276629
Spain 4564.802410
Sweden 9911.878226
Switzerland 17909.489730
Turkey 2218.754257
United Kingdom 11283.177950
Name: gdpPercap_1957, dtype: float64
```
```python
# Pandas introduces new data types
print(type(data))
<class 'pandas.core.frame.DataFrame'>
```
```python
print(type(col1))
<class 'pandas.core.series.Series'>
```
### Get data subsets by position
```python
subset1 = data.iloc[0, 0]
print(subset1)
1601.056136
```
### Get data subsets by label
```python
subset2 = data.loc["Albania", "gdpPercap_1952"]
print(subset2)
1601.056136
```
### Get row by label
```python
data.loc["Albania",:]
gdpPercap_1952 1601.056136
gdpPercap_1957 1942.284244
gdpPercap_1962 2312.888958
gdpPercap_1967 2760.196931
gdpPercap_1972 3313.422188
gdpPercap_1977 3533.003910
gdpPercap_1982 3630.880722
gdpPercap_1987 3738.932735
gdpPercap_1992 2497.437901
gdpPercap_1997 3193.054604
gdpPercap_2002 4604.211737
gdpPercap_2007 5937.029526
Name: Albania, dtype: float64
```
```python
country_subset = data.loc["Italy":"Poland", "gdpPercap_1962":"gdpPercap_1972"]
```
```python
country_subset
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy 8243.582340 10022.401310 12269.273780
Montenegro 4649.593785 5907.850937 7778.414017
Netherlands 12790.849560 15363.251360 18794.745670
Norway 13450.401510 16361.876470 18965.055510
Poland 5338.752143 6557.152776 8006.506993
```
```python
print(type(country_subset))
print(country_subset.describe())
<class 'pandas.core.frame.DataFrame'>
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
count 5.000000 5.000000 5.000000
mean 8894.635868 10842.506571 13162.799194
std 4093.410673 4855.106424 5517.298708
min 4649.593785 5907.850937 7778.414017
25% 5338.752143 6557.152776 8006.506993
50% 8243.582340 10022.401310 12269.273780
75% 12790.849560 15363.251360 18794.745670
max 13450.401510 16361.876470 18965.055510
```
```python
# Gives you dataframes for 2 specific countries in your data
data.loc[["Italy","Poland"], :]
gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
country
Italy 4931.404155 6248.656232 8243.582340 10022.401310 12269.273780 14255.984750 16537.483500 19207.234820 22013.644860 24675.02446 27968.09817 28569.71970
Poland 4029.329699 4734.253019 5338.752143 6557.152776 8006.506993 9508.141454 8451.531004 9082.351172 7738.881247 10159.58368 12002.23908 15389.92468
```
alt solution:
```python
italy = data.loc["Italy", "gdpPercap_1952":"gdpPercap_1962"]
poland = data.loc["Poland", "gdpPercap_1952":"gdpPercap_1962"]
pd.concat([italy, poland])
gdpPercap_1952 4931.404155
gdpPercap_1957 6248.656232
gdpPercap_1962 8243.582340
gdpPercap_1952 4029.329699
gdpPercap_1957 4734.253019
gdpPercap_1962 5338.752143
dtype: float64
```
```python
data.iloc[0:2, 0:2]
gdpPercap_1952 gdpPercap_1957
country
Albania 1601.056136 1942.284244
Austria 6137.076492 8842.598030
```
## Filter data
```python
#Filtering data by a criterion
country_subset
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy 8243.582340 10022.401310 12269.273780
Montenegro 4649.593785 5907.850937 7778.414017
Netherlands 12790.849560 15363.251360 18794.745670
Norway 13450.401510 16361.876470 18965.055510
Poland 5338.752143 6557.152776 8006.506993
```
```python
country_subset > 10000
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy False True True
Montenegro False False False
Netherlands True True True
Norway True True True
Poland False False False
```
```python
country_subset[country_subset > 10000]
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy NaN 10022.40131 12269.27378
Montenegro NaN NaN NaN
Netherlands 12790.84956 15363.25136 18794.74567
Norway 13450.40151 16361.87647 18965.05551
Poland NaN NaN NaN
```
```python
# Using the where() method for filtering
country_subset.where(country_subset > 10000)
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy NaN 10022.40131 12269.27378
Montenegro NaN NaN NaN
Netherlands 12790.84956 15363.25136 18794.74567
Norway 13450.40151 16361.87647 18965.05551
Poland NaN NaN NaN
```
```python
# Method chaining
country_subset.where(country_subset > 10000).describe()
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
count 2.000000 3.000000 3.000000
mean 13120.625535 13915.843047 16676.358320
std 466.373656 3408.589070 3817.597015
min 12790.849560 10022.401310 12269.273780
25% 12955.737548 12692.826335 15532.009725
50% 13120.625535 15363.251360 18794.745670
75% 13285.513522 15862.563915 18879.900590
max 13450.401510 16361.876470 18965.055510
```
```python
country_subset.rank()
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy 3.0 3.0 3.0
Montenegro 1.0 1.0 1.0
Netherlands 4.0 4.0 4.0
Norway 5.0 5.0 5.0
Poland 2.0 2.0 2.0
```
```python
# An elaborate chaining example
country_subset.rank().corr("kendall")
```
```python
country_subset.to_csv("country_subset.csv")
country_subset
gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
country
Italy 8243.582340 10022.401310 12269.273780
Montenegro 4649.593785 5907.850937 7778.414017
Netherlands 12790.849560 15363.251360 18794.745670
Norway 13450.401510 16361.876470 18965.055510
Poland 5338.752143 6557.152776 8006.506993
```
LIVE LESSON NOTES:
<a href='https://drive.google.com/file/d/1DBtLmrjjcgmi3NBXBIuwS7U18ZJ8sWmH/view?usp=sharing'>Day 2 live notes A</a>
<a href='https://drive.google.com/file/d/1C-UeEGd1j6tvbGl78ytxqwhE09LaawyA/view?usp=sharing'>Day 2 live notes B</a>
### End Day 2
## Workshop Day 3
### First name and Last Name/Organization/Dept./Email
| Name (first & last) | Organization | Dept. | Email |
| ------------------------- | ------------ | ----- | --------------- |
| Zhiyuan Yao |UCLA | Data Science Center | zyao@ucla.edu |
| Mario Cuaya |UCR |Computer Science| mcuay001@ucr.edu |
|Amber Heidbrink |UCSD |Cell and Developmental Biology |aheidbrink@ucsd.edu |
| Douglas Zhang | UCSD | Chemistry and Biochemistry | doz023@ucsd.edu |
|Stella Yuan |UCLA |Ecology and Evolutionary Biology |scy8@ucla.edu |
|Benjamin Nauman | UCLA | Geography | bnauman@ucla.edu |
|Belina Chong |UCLA |Ecology and Evolutionary Biology |moonmoon394@ucla.edu |
| Haley Potts | UCSD | Math & Economics | hpotts@ucsd.edu |
| Igor Aprelev | UCSD |Mathematics and Economics | iaprelev@ucsd.edu |
| Jun Tan |UCSD |Economics|j4tan@ucsd.edu |
| Jonathan Le | UCR |Mathematics| jle173@ucr.edu |
| Bineh Ndefru | UCLA | Materials Science| bndefru@ucla.edu |
| Jay Chi | UCSB | ETS | jaychi@ucsb.edu |
| Kazuma Nagatsuka | UCSD | Robotics(Mechanical Engineering) | knagatsuka@ucsd.edu |
|Josiah Piceno|UCM |MBSE |jpiceno3@ucmerced.edu |
|Yibing Zhang |UCM |Bioengineering |yzhang291@ucmerced.edu |
| Simran Kanal | UCSF | Epidemiology and Biostatistics |simran.kanal@ucsf.edu |
| Dilawer Ali | UC Merced | Mechanical Engineering | dali4@ucmerced.edu |
| Tahirah Williams | UCM | QSB |twilliams76@gmail.com |
|Christian Henry|UC Berkeley|UC Berkeley|chrishenry@berkeley.edu||| | | |
|Zhaoning (Johnny) Wang |UCSD|CMM |zhw063@health.ucsd.edu |
| Daryl Han |UC Irvine | Student Center and Event Services | ddhan@uci.edu |
| Jacob Ross |UCSD |Anesthesiology|jaross@ucsd.edu |
| Jay Colond |UCM | Sociology | jcolond@ucmerced.edu |
|John Thompson | UC Merced | Molecular & Cellular Biology | jthompson44@ucmerced.edu |
|Apisit Kaewsanit | UCSF | Epidemiology and Biostatistics | apisit.kaewsanit@ucsf.edu |
|Caitlin Tribelhorn | UCSD | Pediatrics | ctribelh@ucsd.edu |
|Waleed Rajabally | UCM |Sociology |wrajabally@ucmerced.edu |
| Junxiao Gao | UCSF | Epidemiology and Biostatistics | Junxiao.Gao@ucsf.edu |
| Sam Erickson | UC Merced | Physics | serickson3@ucmerced.edu |
| Christopher Gray | UCR | Computer Science | cgray024@ucr.edu |
## Day 3 Questions:
Please enter any questions not answered during live session here:
1.
## Day 3 Live Class Notes:
```python=
# Day 3 Lists
# brackets[]
# can have different data types
# it is mutable - character string is not mutable
# you can extend/append a slist to make it longer
pressure = [0.6, 0.7, 0.8, 0.9]
print(pressure)
#output
[0.6, 0.7, 0.8, 0.9]
```
```python=
list_a = ['a', 'b', 4, 6.7]
print(list_a)
#output
['a', 'b', 4, 6.7]
```
```python=
#array
import numpy as np
a = np.array
```
```python=
len(list_a)
#output
4
```
```python=
list_a[1]
#output
'b'
```
```python=
pressure
#output
[0.6, 0.7, 0.8, 0.9]
```
```python=
# assign a new value to a list
pressure[3] = 5
pressure
#output
[0.6, 0.7, 0.8, 5]
```
```python=
# extend or append new values to make a list longer
a = [1,2,3,4]
b = [5,6,7,8,9]
a.append(b)
print(a)
#output
[1, 2, 3, 4, [5, 6, 7, 8, 9]]
```
```python=
a[4][1]
#output
6
```
```python=
a = [1,2,3,4]
a.append(8)
print(a)
#output
[1, 2, 3, 4, 8]
```
```python=
# extend
a = [1,2,3,4]
b = [5,6,7,8,9]
a.extend(b)
print(a)
#output
[1, 2, 3, 4, 5, 6, 7, 8, 9]
```
```python=
list_empty = []
print(list_empty)
#output
[]
```
```python=
# character string in immutable
string_list = 'address'
string_list[3]
#output
'r'
```
```python=
string_list[3] = 'o'
#output
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-0babce904faa> in <module>
----> 1 string_list[3] = 'o'
TypeError: 'str' object does not support item assignment
```
```python=
# But if you convert the string to a list, you can use index to change the list
string_list = list('address')
string_list[3] = 'o'
print(string_list)
#output
['a', 'd', 'd', 'o', 'e', 's', 's']
```
```python=
# from a string to a list and back
string_a = 'gold'
string_list = list('gold')
print(string_list)
#output
['g', 'o', 'l', 'd']
```
```python=
string_list[0]
#output
'g'
```
```python=
#convert a list to a string using string.join()
string_list
print(''.join(string_list))
#output
gold
```
```python=
# Stepping through a list
string_list = list('address')
print(string_list)
#output
['a', 'd', 'd', 'r', 'e', 's', 's']
```
```python
# the double colon means I want to look through each value in the list
string_list[::1]
#output
# putting 2 instead of 1 means to look through every other (or 2nd) value
string_list[::2]
#output
['a', 'd', 'e', 's']
```
```python=
# putting 2 at the beginning only omits the first two index
string_list[2::]
#output
['d', 'r', 'e', 's', 's']
```
```python=
# Difference between sort and sorted using a list
string_list = list('gold')
result = sorted(string_list)
print(result)
# the output is sorted in alphabetical order
['d', 'g', 'l', 'o']
```
```python=
print(string_list)
#output
['g', 'o', 'l', 'd']
```
```python=
string_list = list('gold')
result = string_list.sort()
print(result)
print(string_list)
#output
None
['d', 'g', 'l', 'o']
```
```python=
# Use sorted(variable) to assign to a new variable; thereby creating a new list
list_num = [10,2,5,7,8,4]
result_num = sorted(list_num)
print(result_num)
print(list_num)
#output
[2, 4, 5, 7, 8, 10]
[10, 2, 5, 7, 8, 4]
```
```python=
list_num = [10,2,5,7,8,4]
result_num = list_num.sort()
print(result_num)
print(list_num)
#output
None
[2, 4, 5, 7, 8, 10]
```
```python=
# Use variable.sort() as a function acting on the list to sort the list in place
# This changes the list itself
list_num.sort()
print(list_num)
#output
[2, 4, 5, 7, 8, 10]
```
```python=
## Lesson: Plotting
import matplotlib. pyplot as plt
```
```python=
time = [1, 2, 3, 4]
position = [100, 200, 300, 400]
plt.plot(time,position, label = 'Position changes during time')
plt.xlabel('Time')
plt.ylabel('Position')
plt.legend()
plt.title('Position changes during time')
#output
Text(0.5, 1.0, 'Position changes during time')
#graph
```
```python=
# Plot directly from a dataframe
import pandas as pd
# import the data and save as a dataframe
data_oceania = pd.read_csv('gapminder_gdp_oceania.csv', index_col = 'country')
# Let's remove part of the columns name to only use the year
data_oceania.columns = data_oceania.columns.str.strip('gdpPercap_')
# Now let's make sure the year is an integer by converting it
data_oceania.columns.astype(int)
print(data_oceania.columns) # this data in the columns of the dataframe
print(data_oceania.index) # this data entry associated with each column
#output
Index(['1952', '1957', '1962', '1967', '1972', '1977', '1982', '1987', '1992',
'1997', '2002', '2007'],
dtype='object')
Index(['Australia', 'New Zealand'], dtype='object', name='country')
```
```python=
# This plot doesn't make much sense
data_oceania.plot()
#output
<AxesSubplot:xlabel='country'>
#graph has several unreadable lines
```
```python=
# Use transpose 'T' to switch the variable axis so the country is on y axis
data_oceania.T.plot()
plt.ylabel('GDP Per Capita') # here we added a y axis label
plt.xticks(rotation = 90) # here we rotated the x axis labels
#output
(array([-2., 0., 2., 4., 6., 8., 10., 12.]),
[Text(-2.0, 0, '2002'),
Text(0.0, 0, '1952'),
Text(2.0, 0, '1962'),
Text(4.0, 0, '1972'),
Text(6.0, 0, '1982'),
Text(8.0, 0, '1992'),
Text(10.0, 0, '2002'),
Text(12.0, 0, '')])
# graph only has two lines for each country
```
```python=
# Using different plot styles with ggplot
plt.style.use('ggplot')
data_oceania.T.plot()
#output
# graph
```
```python=
plt.style.use('seaborn')
# Let's plot one country against the other country
# s changes the size
# c changes the color
# m changes the type of marker
data_oceania.T.plot(kind = 'scatter', x = 'New Zealand', y = 'Australia', s = 60, c = 'orange', marker = '3')
#output
# graph
```
# Challenges
### Challeges #1
Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. Modify it again to plot the maximum GDP per capita over time for Europe.
```python =
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.____.plot(label='min')
data_europe.____
plt.legend(loc='best')
plt.xticks(rotation=90)
```
### Challenge #1 solution
```python =
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.min().plot(label='min')
data_europe.max().plot(label='max')
plt.legend(loc='best')
plt.xticks(rotation=90)
```
### Challenge #2
Fill in the blanks so that the program below produces the output shown.
```python =
values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)
# output
first time: [1, 3, 5]
second time: [3, 5]
```
### Challenge #2 solution
```python =
values = []
values.append(1)
values.append(3)
values.append(5)
print('first time:', values)
values = values[1:]
print('second time:', values)
```
### Challenge #3
Fill in the blanks in each of the programs below to produce the indicated result.
```python =
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
____ = ____ + len(word)
print(total)
```
### Challenge #3 solution
```python =
total = 0
for word in ["red", "green", "blue"]:
total = total + len(word)
print(total)
```
### Challenge #4
Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.
```python =
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
if ____:
result.append(0)
else:
____
print(result)
# output
[0, 1, 1, 1, 0, 1]
```
### Challenge #4 solution
```python =
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
if value < 0.0:
result.append(0)
else:
result.append(1)
print(result)
```
LIVE Session Notes: https://drive.google.com/file/d/1y8A0xUEWSdSrAhS9Sbvx39Etb1Vnn4rM/view?usp=sharing
### End Day 3