owned this note
owned this note
Published
Linked with GitHub
---
tags: gps
---
# 2021-GPS-Data-Skills-Course-Python
## Collaborative Notes:
Please sign in here:
first, last name/favorite drink
Reid Otsuji - Hawaii Fruit Tea
Rolando Almada - Orange Mocha Frappuccino
Leo Do - Peach Iced Tea
Colin Trobough - Single Malt Scotch
Alejandra Guzman - Diet Coke
Ayush Jain
Xin Meng - Pu'er tea
Ada Tong - Roasted brown rice milk tea
Khang Do - green tea
Elissa Bozhkov - Passionfruit mojitos
Yongun Ra - Big Wave
Daniel Blaugher - coffee
Tyler Spencer - coffee
Yue Wang - KBS beer
Lei Lei - Milk Tea
Deepika Bagaria - Milk
Camille Caterina/milk tea
Rachel Lietzow - Milk Tea
Emily Carlton- coffee
Bowin Lee - green tea
Manabu Hiratsuka - coffee
Tomas Lavados- coffee
Bonnie Devenney - kombucha
Alex Schiller - coffee
Zahrah Zimmerer - Gatorade Zero
Jonathan Bazan - Iced Mocha
You can run Markdown in a Jupyter Notebook
Markdown cheatsheet
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
By the way.... HackMD uses Markdown!
For conversion between the cell types, the short cut is the following once you are in command mode (cell blue):
y - code
m - markdown
r - raw
## creating variables
```
car1 = 'toyota'
car_2 = 'prius'
car3 = 123
car4 = 2.5
print(car4)
```
## reassign variables
```
car4 = 5.0
print(Car4)
```
```
car4 = 6
print(car4, Car4) #make sure you check your variable names e.g. use of capital letters in the variable name
```
## print function print()
```
first_name = 'Kim'
age = '37'
print('hi, my name is', first_name, 'and I am', age,'years old.')
```
**arguments**: information you pass to the function
### Exercise:
assign the variable name color1 to the value red and the variable named color2 to the value blue. Then print 'red is not blue' using the variable names as arguments.
#### solution:
```
color1 = 'red'
color2 = 'blue'
print(color1, 'is not', color2)
```
you can do calcuations in Python:
```
2359 * 32
123/23
3*4
calc1 = 3*4
calc2 = 2*20
print(calc1, calc2)
````
### Exercise:
What is displayed when a python cell in a notebook that contains several calcuations are executed?
7*3
2*3
Dataypes in Python:
datatype1 = 'string'
datatype2 = 154 #integer
datatype3 = 2.5 #float
newtype = float(datatype2) #changed 154 from integer datatype to float datatype
# ##### Python Day 2 - notes######
Review:
### python data types:
```
integer_1 = 3
integer_2 = -512
float_1 = 3.14
float_2 = 12.5
string_1 = 'hello world'
string_2 = 'bus'
print(type(integer_1), type(integer_2))
print(type(float_2))
print(type(string_1))
print(type(string_2))
type('Me')
```
#### Challenge 1
What type of values are the following: 3.4, car1, 3589, E234, 'car2'? Use the appropriate built-in function to find out the datatypes.
```
car1 = 2017
```
```
print(type(3.4))
print(type(car1))
print(type(3589))
print(type('E234'))
print(type('car2'))
```
class = the type of object.
example:
<class 'float'>
<class 'int'>
Most math operators will only work with float or intergers
`-` operators do no work with strings
note: the `+` operator will work with strings to "concatenate" or join the string
```
full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)
```
```
num7 = '2'*10 # Repeat the string "2" for 10 times
print(num7)
```
```
num8 = 2
num9 = 5
num10 = 'two'
print(num8 * num9)
print(num10 * num9)
```
#### Challenge 2
What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer?
1. Number of days since the start of the year.
2. Time elapsed from the start of the year until now in days.
3. Serial number of a piece of lab equipment.
4. A lab specimen’s age
5. Current population of a city.
6. Average population of a city over time.
#### answers:
1. integer b/c/ between 1 and 365 - no fractions
2. floating point b/c doing the math would result in a fraction
3. string (ex. 'A2345') or integer (ex '123456') or float (ex '121.23')
4. integer (ex 3) or string (ex '3 days' or '3.5 days')
5. Float or integer
6. Float
8.
converting data types:
```
var1 = '123'
type(var1)
```
```
var2 = int(var1)
print(Var2)
```
```
print(type(var2))
```
```
var3 = '123E'
print(var3, type(var3))
```
```
var5 = float(var1)
print(var1, type(var1))
```
```
var5 = float(var1)
print(var5, type(var5))
```
Challenge 3
Which of the following will return the floating point number 2.0? Note: there may be more than one right answer.
```
first = 1.0
second = "1"
third = "1.1"
```
**1. first + float(second)**
2. float(second) + float(third)
3. first + int(third)
**4. first + int(float(third))**
5. int(first) + int(float(third))
6. 2.0 * second
```
print('half is', 1/2.0)
```
variable reassignment:
```
first1 = 1
second1 = 5 * first1
first1 = 2
print('first1 is', first1, 'and second1 is', second1)
```
using functions()
functions assigned to variables will give a result of None:
```
result = print('exmaple')
print('result of print is', result)
```
```
print(max(1,'a'))
print(max(1,45,32,0,24))
```
```
print(max(1,45,32,0,24))
```
```
round(3.712)
round(3.712,1)
```
indexing in python:
```
atom_name = 'helium'
print(atom_name[0])
```
index are for: stings, lists, objects in dataframes
Python index starts at 0
```
atom_name = 'helium'
print(atom_name[3])
```
slicing:
```
atom_name = 'sodium'
print(atom_name[0:3])
```
```
print(len('sodium'))
print(len(atom_name))
```
print(len('52'))
# ########Week 2 Python###########
## libraries
Collection of files
- data vaules
- related
- python standard library
- PyPI python package index
```
import math
print(math.pi)
print(math.cos(math.pi))
```
Help function:
```
help(math)
```
**Challenge 1**
use the help() to find and print out the 'tau' constant from the math library
```
print(math.tau)
```
**Challenge 2**
A colleague of yours wanted to use help() to check out the math library. however when he runs help(math), he receiveds a NameError. WHat has he forgotten to do?
answer: he didn't import the math library
Python can import specific library items
- shortening programs
- allows you to use items from library without the library prefix
syntax
`from ... import ...`
```
from math import cos, pi
print(pi)
print(cos(pi))
# note that this would not work, why?
# (answer: we only imported specific functions
# from the module, not the entire module )
print(math.tau)
```
## importing as alias
syntax
import ... as ...
- used as a separate name for a library
- useful for abbrev. long library names
- makes code difficult to read for others
```
import math as m
print(m.pi)
print(m.cos(m.pi))
```
## Errors and Exceptions
errors in python = traceback
Number of **arrows** specifies level of error.
The most **recent** **error** is at the bottom of the traceback.
### Index Error
What's wrong with this code, and how can we fix this from information we get from the traceback?
```
## this code will generate an example traceback error:
def print_random_string():
random_string = [
'bus', #0
'wheel', #1
'blue', #2
]
print(random_string[3])
print_random_string() # IndexError
```
Python is 0-based, so we do not have the 3rd element. To correct this, we can add another element:
```
## this code will generate an example traceback error:
def print_random_string():
random_string = [
'bus', #0
'wheel', #1
'blue', #2
'two'
]
print(random_string[3])
print_random_string() # Error resolved!
```
### Syntax Error
Challenge:
copy the code and execute it , analyze the error and try to fix it:
```
## code with error:
def challenge()=
msg = 'message"
print(msg)
challenge() # SyntaxError
```
Solution:
```
def challenge(): #colon needed
msg = "message" # single quote vs double quote, needs to be the same
print(msg)
challenge()
```
### Indentation Error
Indentation error:
check the indentation in code
The indentation error some times occurs when you copy and paste as well
why are the indentation necessary?
shows which lines of code belong to a function
indentation error example:
```
## this code will generate an indentation error:
def print_hello():
msg = 'Hello!'
print(msg)
print_hello()
```
```
## this code will show the fixed indentation error:
def print_hello:
msg = 'Hello!'
print(msg) #pirnt was indented to far
print_hello()
```
### Variable Errors
Variable errors
```
## this code will generate a varaible name error
def print_name():
print(name)
print_name()
name = 'Reid'
```
```
name = 'Reid' #variable needs to be declared first
def print_name():
print(name)
print_name()
```
**typos in code is a common mistake**
## File Errors
FileNotFoundError
UnsupportedOperationError
* 'r' for Read
* 'w' for Write
```
file_nonexistent = open('idnothavethisfile.txt', 'r')
```
**specifiying wrong file path is a common reason for fileNotFoundError**
##### UnsupportedOperationError
```
new_file = open('new_text_file.txt','w') # when running this code, python does not give an error because it will create a new text file
new_file.read() # this will give an UnsupportedOperation error
```
## debugging tips
1. know what your code is supposed to do - know what the end output should be
2. make it fail every time - identify your problem quickly by using test cases to determine what cause your program to fail
3. make it fail fast
4. Change one thing at a time, and for a reason
5. keep track of what you've done
6. Google or ask others!
# ######Python week 2 - lesson 4 #####
Popular python Libraries:
matplotlib - for scientific visualization
numpy - basic type is numpy array, containing a list of complex data
pandas - manipulating and anlysing large sacle datasets
## Numpy
*[NumPy]: NUMeric Python
NumPy is a package for scientic computation in Python
```python
import numpy as np
# Load the file, make sure you have the correct path
# delimiter - how the data is separated.
np.loadtxt(fname="inflammation-01.csv", delimiter=',')
```
Adding data to a variable:
```python
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
```
look at data type
```python
print(type(data))
```
find the mean of the array
```python
print(np.mean(data))
```
get descriptive values about your data
```python
maxval = np.max(data)
minval = np.min(data)
stdval = np.std(data)
print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)
```
### ndarray
*[ndarray]: n-dimensional array
Inside each ndarray, each column stores a unique characteristics and each row stores a observation.
Series represents each column inside of ndarray
## Pandas
https://pandas.pydata.org
pandas cheatsheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
importing pandas library
```python
import pandas as pd
```
```python
data = pd.read_csv('gapminder_gdp_oceania.csv')
print(data)
```
assign an index for country
```python
data = pd.read_csv('gapminder_gdp_oceania.csv', index_col='country')
print(data)
```
### challenge 1
read the data in 'gapminder_gdp_americas.csv' into a variable called 'americas' and display its summart statistics
```python
data_americas = pd.read_csv('gapminder_gdp_americas.csv', index_col='country')
print(data_americas)
```
```python
# Give you a general descriptive information regarding a dataframe
data.info()
```
show information about the columns in a dataframe
```python
print(data.columns)
```
transpose dataframe
```python
print(data.T)
```
Get key statistics
```python
data.describe()
```
Write to CSV
```python
data.to_csv("mynewdatafile.csv")
# This would tell me the task is done after the data is written
print("File has been written")
```
*[iloc]: Implicit location, using the implicit numeric index in dataframe
```python=
data = pd.read_csv("gapminder_gdp_europe.csv", index_col="country")
print(data.iloc[0,0])
```
specifying data location:
```python
print(data.loc["Albania",:])
```
Slicing in Python, selecting a range of data
```python
print(data.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972'])
```
apply other functions to variables
```python
def multiplyby5(x):
return x*5
data2 = data.apply(multiplyby5)
data2
```
using the apply function
```python
data["pop_by5_2007"] = data['gdpPercap_2007'].apply(multiplyby5)
data
```
Challenge 2¶
Assume Pandas has been imported into your notebook and the Gapminder GDP data for Europe has been loaded:
import pandas as pd df = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
Write an expression to find the Per Capita GDP of Serbia in 2007.
# ######### Python week 3 ############
Python library
matplotlib
pyplot - sub-package of matplotlib
## Plotting with Matplotlib
Import matplotlib.pyplot, give it an alias of plt
```python=
import matplotlib.pyplot as plt
# show() is used to display plots
# in most code editors, but in jupyter notebook this is fine
%matplotlib inline
# Cool interactive plot!
%matplotlib notebook
```
```python=
time = [0,1,2,3]
position = [0,100,200,300]
plt.plot(time, position) # (x,y) = (time, position)
# Giving plot labels
plt.xlabel('Time (hr)')
plt.ylabel('Position (km)')
# Showing labels in plot
plt.text(0.5, 50, 'Some_text')
# Jupyter notebook shows the plot automatically,
# but in other text editors we have to use this to show graph
plt.show()
```
```python
import pandas as pd
# load data
data = pd.read_csv('gapminder_gdp_oceania.csv', index_col='country')
# strip away "gdpPercap_" part from all column names
years = data.columns.str.strip('gdpPercap_')
# convert year values to integers and saving the results back to the dataframe
data.columns = years.astype(int)
# select the Australia and plot time series data
data.loc['Australia'].plot()
# transpose method
data.T.plot()
plt.ylabel('GDP per capita')
```
###Styles of plots you can use
ggplot format
```python
plt.style.use('ggplot')
data.T.plot(kind='bar')
plt.ylabel('GDP per capita')
```
if you reassign a variable, remember, Jupyter will remember the variable until the J
```python=
years = data.columns
gdp_australia = data.loc['Australia']
# g-- specifies color and type of line
plt.plot(years, gep_australia, 'g--')
```
Create plot of GDP per capita over time for New Zealand and Australia (Note the jump after the economic reform in New Zealand!)
```python=
gdp_australia = data.loc['Australia']
gdp_nz = data.loc['New Zealand']
plt.plot(years, gdp_australia, 'b-', label='Australia')
plt.plot(yaers, gdp_nz, 'g-', label="New Zealand")
# create a legend
plt.legend(loc='upper left')
plt.xlabel('Year')
plt.ylabel('GDP per capita ($)')
```
plot a scatter plot
```python=
plt.scatter(gdp_australia, gdp_nz)
# Alternatively, we can do
data.T.plot.scatter(x = 'Australia', y='New Zealand')
```
### Challenge 1
Complete the code to plot the minimum GDP per capita over time for all the coutries in Europe, Modify it again to plot the maximum GDP per capita over time for Europe
```python=
# starter code
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.___.plot(label='min')
data_europe.___.
plt.legend(loc='best')
plt.xticks(rotation=90)
# solution
data_europe = pd.read_csv('gapminder_gdp_europe.csv', index_col='country')
data_europe.min().plot(label='min')
data_europe.max().plot(label='max')
plt.legend(loc='best')
plt.xticks(rotation=90)
```
## List
array: a data type in numpy and panda, it stores **same** type of data
list: a data type in core Python, it can store **mixed** types of objects
```python=
# a list can use len() to find how many values in list
pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
# tells you the length of your list
print("The length of pressure is:", len(pressures))
print('pressure are:', pressures)
```
```python=
print('zeroth item of pressures:', pressures[0])
```
```python
# Replace a zero index value with a new value
pressure[0] = 0.265
print('pressure is now:', pressure)
```
```python=
# using [list].append to append items to the end of the list
primes = [2,3,5]
print('primes is initially:', primes)
# this append an element "in place"
# Meaning that the action of appending directly acts upon the object before the dot.
primes.append(7)
print('primes is now:', primes)
```
```python=
new_prime = [11,13,17,19]
print('prime is currently:', primes)
primes.extend(new_prime)
print('primes has now become:', primes)
```
```python=
# to delete times from a list `del`
del primes[7]
print("primes after removing last item:", primes)
```
```python=
# add to a blank list
mylist = []
mylist.append('hello')
print(mylist)
```
## For loop
```python=
# print each number in the list
for number in [2,3,5]:
print(number)
# note the indentation! Content of the loop should be indented
some_number = [2,3,5]
for number in some_numbers:
print(number)
# wrong indentation example, Python will complain
firstname = 'Jon'
lastname = 'Smith'
```
```python=
# You can give any name to the looping variable, as long as it is a valid name
for goopey in some_numbers:
print(goopey)
```
```python=
# for each number in some_numbers
# square it and cube it
# And eventually print it
for p in some_numbers:
squared = p**2
cubed = p**3
print(p, squared, cubed)
```
```python=
# looping over range
# note how it does not reach 3.
# print out 0,1,2
for number in range(0,3):
print(number)
```
```python=
# accumulator pattern using a loop
total = 0
for number in range(10):
total = total + (number + 1)
print(total)
```
# #####week 3 - Python Day 6
## Conditionals and Loops
```python=
mass = 3.54
if mass > 3:
print(mass, 'is large')
mass = 2.07
if mass > 3:
print(mass, 'is large')
```
```python=
masses = [3.54, 2.07, 9.22, 1.71]
for m in masses:
if m > 3.0:
print(m, 'is large')
```
```python=
# Example of using an 'else' condition as a 'catch all'
masses = [3.54, 2.07, 9.22, 1.71]
for m in masses:
if m > 3:
print(m, 'is large')
else:
print(m, 'is small')
```
```python=
# example of an 'if' condition with 'elif' alternative and 'else as the 'catch all'
masses = [3.54, 2.07, 9.22, 1.71]
for m in masses:
if m > 9.0:
print(m, 'is HUGE')
elif m > 3.0:
print(m, 'is Large')
else:
print(m, 'is small')
```
```python=
# Example of output from 'if' condition even when 'else' condition is applicable
grade = 85
if grade >= 70:
print('grade is C')
elif grade >= 80:
print('grade is B')
elif grade >= 90:
print('grade is A')
```
```python=
# reordering to get desired output
grade = 85
if grade >= 90:
print('grade is A')
elif grade >= 80:
print('grade is B')
elif grade >= 70:
print('grade is C')
```
```python=
# Example displaying a 'catch all' output when none of the conditions apply
grade =65
if grade >= 90:
print('grade is A')
elif grade >= 80:
print('grade is B')
elif grade >= 70:
print('grade is C')
else:
print('Failed')
```
```python=
# The condition output will not change even if the value tested is changed by the condition
velocity = 10.0
if velocity > 20.0:
print('moving too fast')
else:
print("adjusting velocity")
velocity = 50.0
```
```python=
# ouly using a loop that includes a change in the variable being tested changes the final output
#you can also comment out the #print() in the loop to only display the final velocity
velocity = 10.0
for i in range(5):
print(i, ':', velocity)
if velocity > 20.0:
print("moving too fast")
velocity = velocity - 5.0
else:
print("moving too slow")
velocity = velocity + 10
print('final velocity:', velocity)
```
## Boolean operators
```python=
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
velocity = [10.00, 20.00, 30.00, 40.00, 50.00]
i = 0
for i in range(5):
if (mass[i] > 5) and (velocity[i] > 20):
print("Fast heavy object. Duck!")
elif (mass[i] > 2) and (mass[i] <= 5) and (velocity[i] <= 20):
print("Normal traffic")
elif (mass[i] <= 2) and (velocity[i] <= 20):
print("Slow light object. Ignore it.")
else:
print("Whoa! Check it")
```
`True and False = False`
`True or False = True`
`is` test if two objects are the *same object*
`==` test if two objects are the same
```python=
car1 = 'Sedan'
car2 = 'Sedan'
if car1 is car2:
print('Both cars')
# But not with string literals, this gives you a warning
if 'Small' is 'Small':
print('Yes')
if car2 is not car3:
print("different vehecles")
```
```python=
import pandas as pd
# load in data files using a for loop
for filename in ['gapminder_gdp_africa.csv', 'gapminder_gdp_asia.csv']:
data = pd.read_csv(filename, index_col='country')
print(filename, data.min())
```
```python=
import glob
print("all csv files in data directory:", glob.glob('*.csv'))
# find all pdb files
print('all pdb files:', glob.glob('*.pdb'))
```
```python=
# matches for all files in the format of
# gapminder_[something].csv
for filename in glob.glob('gapminder_*.csv'):
data = pd.read_csv(filename)
print(filename, data['gdpPercap_1952'].min())
```
## Function
```python=
def print_greeting():
print("Hello")
```
```python
# call the function without passing an argument as an argument is not needed per this function definition
def print_greeting():
print('Hello')
print_greeting()
```
```python=
# a function that must take an argument in order to work
def print_data(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
return joined
print_date(1871, 3, 19)
date = print_date(1871, 3, 19)
```
```python=
def average(values):
if len(values) == 0:
return None
return sum(values) / len(values)
a = average([1,3,4])
print('average of actual values:', a)
print("Average of empty list:", average([]))
```