2021-GPS-Data-Skills-Course-Python

--- tags: gps --- # 2021-GPS-Data-Skills-Course-Python ## Collaborative Notes: Please sign in here: first, last name/favorite drink Reid Otsuji - Hawaii Fruit Tea Rolando Almada - Orange Mocha Frappuccino Leo Do - Peach Iced Tea Colin Trobough - Single Malt Scotch Alejandra Guzman - Diet Coke Ayush Jain Xin Meng - Pu'er tea Ada Tong - Roasted brown rice milk tea Khang Do - green tea Elissa Bozhkov - Passionfruit mojitos Yongun Ra - Big Wave Daniel Blaugher - coffee Tyler Spencer - coffee Yue Wang - KBS beer Lei Lei - Milk Tea Deepika Bagaria - Milk Camille Caterina/milk tea Rachel Lietzow - Milk Tea Emily Carlton- coffee Bowin Lee - green tea Manabu Hiratsuka - coffee Tomas Lavados- coffee Bonnie Devenney - kombucha Alex Schiller - coffee Zahrah Zimmerer - Gatorade Zero Jonathan Bazan - Iced Mocha You can run Markdown in a Jupyter Notebook Markdown cheatsheet https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet By the way.... HackMD uses Markdown! For conversion between the cell types, the short cut is the following once you are in command mode (cell blue): y - code m - markdown r - raw ## creating variables ``` car1 = 'toyota' car_2 = 'prius' car3 = 123 car4 = 2.5 print(car4) ``` ## reassign variables ``` car4 = 5.0 print(Car4) ``` ``` car4 = 6 print(car4, Car4) #make sure you check your variable names e.g. use of capital letters in the variable name ``` ## print function print() ``` first_name = 'Kim' age = '37' print('hi, my name is', first_name, 'and I am', age,'years old.') ``` **arguments**: information you pass to the function ### Exercise: assign the variable name color1 to the value red and the variable named color2 to the value blue. Then print 'red is not blue' using the variable names as arguments. #### solution: ``` color1 = 'red' color2 = 'blue' print(color1, 'is not', color2) ``` you can do calcuations in Python: ``` 2359 * 32 123/23 3*4 calc1 = 3*4 calc2 = 2*20 print(calc1, calc2) ```` ### Exercise: What is displayed when a python cell in a notebook that contains several calcuations are executed? 7*3 2*3 Dataypes in Python: datatype1 = 'string' datatype2 = 154 #integer datatype3 = 2.5 #float newtype = float(datatype2) #changed 154 from integer datatype to float datatype # ##### Python Day 2 - notes###### Review: ### python data types: ``` integer_1 = 3 integer_2 = -512 float_1 = 3.14 float_2 = 12.5 string_1 = 'hello world' string_2 = 'bus' print(type(integer_1), type(integer_2)) print(type(float_2)) print(type(string_1)) print(type(string_2)) type('Me') ``` #### Challenge 1 What type of values are the following: 3.4, car1, 3589, E234, 'car2'? Use the appropriate built-in function to find out the datatypes. ``` car1 = 2017 ``` ``` print(type(3.4)) print(type(car1)) print(type(3589)) print(type('E234')) print(type('car2')) ``` class = the type of object. example: <class 'float'> <class 'int'> Most math operators will only work with float or intergers `-` operators do no work with strings note: the `+` operator will work with strings to "concatenate" or join the string ``` full_name = 'Ahmed' + ' ' + 'Walsh' print(full_name) ``` ``` num7 = '2'*10 # Repeat the string "2" for 10 times print(num7) ``` ``` num8 = 2 num9 = 5 num10 = 'two' print(num8 * num9) print(num10 * num9) ``` #### Challenge 2 What type of value (integer, floating point number, or character string) would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer? 1. Number of days since the start of the year. 2. Time elapsed from the start of the year until now in days. 3. Serial number of a piece of lab equipment. 4. A lab specimen’s age 5. Current population of a city. 6. Average population of a city over time. #### answers: 1. integer b/c/ between 1 and 365 - no fractions 2. floating point b/c doing the math would result in a fraction 3. string (ex. 'A2345') or integer (ex '123456') or float (ex '121.23') 4. integer (ex 3) or string (ex '3 days' or '3.5 days') 5. Float or integer 6. Float 8. converting data types: ``` var1 = '123' type(var1) ``` ``` var2 = int(var1) print(Var2) ``` ``` print(type(var2)) ``` ``` var3 = '123E' print(var3, type(var3)) ``` ``` var5 = float(var1) print(var1, type(var1)) ``` ``` var5 = float(var1) print(var5, type(var5)) ``` Challenge 3 Which of the following will return the floating point number 2.0? Note: there may be more than one right answer. ``` first = 1.0 second = "1" third = "1.1" ``` **1. first + float(second)** 2. float(second) + float(third) 3. first + int(third) **4. first + int(float(third))** 5. int(first) + int(float(third)) 6. 2.0 * second ``` print('half is', 1/2.0) ``` variable reassignment: ``` first1 = 1 second1 = 5 * first1 first1 = 2 print('first1 is', first1, 'and second1 is', second1) ``` using functions() functions assigned to variables will give a result of None: ``` result = print('exmaple') print('result of print is', result) ``` ``` print(max(1,'a')) print(max(1,45,32,0,24)) ``` ``` print(max(1,45,32,0,24)) ``` ``` round(3.712) round(3.712,1) ``` indexing in python: ``` atom_name = 'helium' print(atom_name[0]) ``` index are for: stings, lists, objects in dataframes Python index starts at 0 ``` atom_name = 'helium' print(atom_name[3]) ``` slicing: ``` atom_name = 'sodium' print(atom_name[0:3]) ``` ``` print(len('sodium')) print(len(atom_name)) ``` print(len('52')) # ########Week 2 Python########### ## libraries Collection of files - data vaules - related - python standard library - PyPI python package index ``` import math print(math.pi) print(math.cos(math.pi)) ``` Help function: ``` help(math) ``` **Challenge 1** use the help() to find and print out the 'tau' constant from the math library ``` print(math.tau) ``` **Challenge 2** A colleague of yours wanted to use help() to check out the math library. however when he runs help(math), he receiveds a NameError. WHat has he forgotten to do? answer: he didn't import the math library Python can import specific library items - shortening programs - allows you to use items from library without the library prefix syntax `from ... import ...` ``` from math import cos, pi print(pi) print(cos(pi)) # note that this would not work, why? # (answer: we only imported specific functions # from the module, not the entire module ) print(math.tau) ``` ## importing as alias syntax import ... as ... - used as a separate name for a library - useful for abbrev. long library names - makes code difficult to read for others ``` import math as m print(m.pi) print(m.cos(m.pi)) ``` ## Errors and Exceptions errors in python = traceback Number of **arrows** specifies level of error. The most **recent** **error** is at the bottom of the traceback. ### Index Error What's wrong with this code, and how can we fix this from information we get from the traceback? ``` ## this code will generate an example traceback error: def print_random_string(): random_string = [ 'bus', #0 'wheel', #1 'blue', #2 ] print(random_string[3]) print_random_string() # IndexError ``` Python is 0-based, so we do not have the 3rd element. To correct this, we can add another element: ``` ## this code will generate an example traceback error: def print_random_string(): random_string = [ 'bus', #0 'wheel', #1 'blue', #2 'two' ] print(random_string[3]) print_random_string() # Error resolved! ``` ### Syntax Error Challenge: copy the code and execute it , analyze the error and try to fix it: ``` ## code with error: def challenge()= msg = 'message" print(msg) challenge() # SyntaxError ``` Solution: ``` def challenge(): #colon needed msg = "message" # single quote vs double quote, needs to be the same print(msg) challenge() ``` ### Indentation Error Indentation error: check the indentation in code The indentation error some times occurs when you copy and paste as well why are the indentation necessary? shows which lines of code belong to a function indentation error example: ``` ## this code will generate an indentation error: def print_hello(): msg = 'Hello!' print(msg) print_hello() ``` ``` ## this code will show the fixed indentation error: def print_hello: msg = 'Hello!' print(msg) #pirnt was indented to far print_hello() ``` ### Variable Errors Variable errors ``` ## this code will generate a varaible name error def print_name(): print(name) print_name() name = 'Reid' ``` ``` name = 'Reid' #variable needs to be declared first def print_name(): print(name) print_name() ``` **typos in code is a common mistake** ## File Errors FileNotFoundError UnsupportedOperationError * 'r' for Read * 'w' for Write ``` file_nonexistent = open('idnothavethisfile.txt', 'r') ``` **specifiying wrong file path is a common reason for fileNotFoundError** ##### UnsupportedOperationError ``` new_file = open('new_text_file.txt','w') # when running this code, python does not give an error because it will create a new text file new_file.read() # this will give an UnsupportedOperation error ``` ## debugging tips 1. know what your code is supposed to do - know what the end output should be 2. make it fail every time - identify your problem quickly by using test cases to determine what cause your program to fail 3. make it fail fast 4. Change one thing at a time, and for a reason 5. keep track of what you've done 6. Google or ask others! # ######Python week 2 - lesson 4 ##### Popular python Libraries: matplotlib - for scientific visualization numpy - basic type is numpy array, containing a list of complex data pandas - manipulating and anlysing large sacle datasets ## Numpy *[NumPy]: NUMeric Python NumPy is a package for scientic computation in Python ```python import numpy as np # Load the file, make sure you have the correct path # delimiter - how the data is separated. np.loadtxt(fname="inflammation-01.csv", delimiter=',') ``` Adding data to a variable: ```python data = np.loadtxt(fname='inflammation-01.csv', delimiter=',') ``` look at data type ```python print(type(data)) ``` find the mean of the array ```python print(np.mean(data)) ``` get descriptive values about your data ```python maxval = np.max(data) minval = np.min(data) stdval = np.std(data) print('maximum inflammation:', maxval) print('minimum inflammation:', minval) print('standard deviation:', stdval) ``` ### ndarray *[ndarray]: n-dimensional array Inside each ndarray, each column stores a unique characteristics and each row stores a observation. Series represents each column inside of ndarray ## Pandas https://pandas.pydata.org pandas cheatsheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf importing pandas library ```python import pandas as pd ``` ```python data = pd.read_csv('gapminder_gdp_oceania.csv') print(data) ``` assign an index for country ```python data = pd.read_csv('gapminder_gdp_oceania.csv', index_col='country') print(data) ``` ### challenge 1 read the data in 'gapminder_gdp_americas.csv' into a variable called 'americas' and display its summart statistics ```python data_americas = pd.read_csv('gapminder_gdp_americas.csv', index_col='country') print(data_americas) ``` ```python # Give you a general descriptive information regarding a dataframe data.info() ``` show information about the columns in a dataframe ```python print(data.columns) ``` transpose dataframe ```python print(data.T) ``` Get key statistics ```python data.describe() ``` Write to CSV ```python data.to_csv("mynewdatafile.csv") # This would tell me the task is done after the data is written print("File has been written") ``` *[iloc]: Implicit location, using the implicit numeric index in dataframe ```python= data = pd.read_csv("gapminder_gdp_europe.csv", index_col="country") print(data.iloc[0,0]) ``` specifying data location: ```python print(data.loc["Albania",:]) ``` Slicing in Python, selecting a range of data ```python print(data.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972']) ``` apply other functions to variables ```python def multiplyby5(x): return x*5 data2 = data.apply(multiplyby5) data2 ``` using the apply function ```python data["pop_by5_2007"] = data['gdpPercap_2007'].apply(multiplyby5) data ``` Challenge 2¶ Assume Pandas has been imported into your notebook and the Gapminder GDP data for Europe has been loaded: import pandas as pd df = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') Write an expression to find the Per Capita GDP of Serbia in 2007. # ######### Python week 3 ############ Python library matplotlib pyplot - sub-package of matplotlib ## Plotting with Matplotlib Import matplotlib.pyplot, give it an alias of plt ```python= import matplotlib.pyplot as plt # show() is used to display plots # in most code editors, but in jupyter notebook this is fine %matplotlib inline # Cool interactive plot! %matplotlib notebook ``` ```python= time = [0,1,2,3] position = [0,100,200,300] plt.plot(time, position) # (x,y) = (time, position) # Giving plot labels plt.xlabel('Time (hr)') plt.ylabel('Position (km)') # Showing labels in plot plt.text(0.5, 50, 'Some_text') # Jupyter notebook shows the plot automatically, # but in other text editors we have to use this to show graph plt.show() ``` ```python import pandas as pd # load data data = pd.read_csv('gapminder_gdp_oceania.csv', index_col='country') # strip away "gdpPercap_" part from all column names years = data.columns.str.strip('gdpPercap_') # convert year values to integers and saving the results back to the dataframe data.columns = years.astype(int) # select the Australia and plot time series data data.loc['Australia'].plot() # transpose method data.T.plot() plt.ylabel('GDP per capita') ``` ###Styles of plots you can use ggplot format ```python plt.style.use('ggplot') data.T.plot(kind='bar') plt.ylabel('GDP per capita') ``` if you reassign a variable, remember, Jupyter will remember the variable until the J ```python= years = data.columns gdp_australia = data.loc['Australia'] # g-- specifies color and type of line plt.plot(years, gep_australia, 'g--') ``` Create plot of GDP per capita over time for New Zealand and Australia (Note the jump after the economic reform in New Zealand!) ```python= gdp_australia = data.loc['Australia'] gdp_nz = data.loc['New Zealand'] plt.plot(years, gdp_australia, 'b-', label='Australia') plt.plot(yaers, gdp_nz, 'g-', label="New Zealand") # create a legend plt.legend(loc='upper left') plt.xlabel('Year') plt.ylabel('GDP per capita ($)') ``` plot a scatter plot ```python= plt.scatter(gdp_australia, gdp_nz) # Alternatively, we can do data.T.plot.scatter(x = 'Australia', y='New Zealand') ``` ### Challenge 1 Complete the code to plot the minimum GDP per capita over time for all the coutries in Europe, Modify it again to plot the maximum GDP per capita over time for Europe ```python= # starter code data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') data_europe.___.plot(label='min') data_europe.___. plt.legend(loc='best') plt.xticks(rotation=90) # solution data_europe = pd.read_csv('gapminder_gdp_europe.csv', index_col='country') data_europe.min().plot(label='min') data_europe.max().plot(label='max') plt.legend(loc='best') plt.xticks(rotation=90) ``` ## List array: a data type in numpy and panda, it stores **same** type of data list: a data type in core Python, it can store **mixed** types of objects ```python= # a list can use len() to find how many values in list pressures = [0.273, 0.275, 0.277, 0.275, 0.276] # tells you the length of your list print("The length of pressure is:", len(pressures)) print('pressure are:', pressures) ``` ```python= print('zeroth item of pressures:', pressures[0]) ``` ```python # Replace a zero index value with a new value pressure[0] = 0.265 print('pressure is now:', pressure) ``` ```python= # using [list].append to append items to the end of the list primes = [2,3,5] print('primes is initially:', primes) # this append an element "in place" # Meaning that the action of appending directly acts upon the object before the dot. primes.append(7) print('primes is now:', primes) ``` ```python= new_prime = [11,13,17,19] print('prime is currently:', primes) primes.extend(new_prime) print('primes has now become:', primes) ``` ```python= # to delete times from a list `del` del primes[7] print("primes after removing last item:", primes) ``` ```python= # add to a blank list mylist = [] mylist.append('hello') print(mylist) ``` ## For loop ```python= # print each number in the list for number in [2,3,5]: print(number) # note the indentation! Content of the loop should be indented some_number = [2,3,5] for number in some_numbers: print(number) # wrong indentation example, Python will complain firstname = 'Jon' lastname = 'Smith' ``` ```python= # You can give any name to the looping variable, as long as it is a valid name for goopey in some_numbers: print(goopey) ``` ```python= # for each number in some_numbers # square it and cube it # And eventually print it for p in some_numbers: squared = p**2 cubed = p**3 print(p, squared, cubed) ``` ```python= # looping over range # note how it does not reach 3. # print out 0,1,2 for number in range(0,3): print(number) ``` ```python= # accumulator pattern using a loop total = 0 for number in range(10): total = total + (number + 1) print(total) ``` # #####week 3 - Python Day 6 ## Conditionals and Loops ```python= mass = 3.54 if mass > 3: print(mass, 'is large') mass = 2.07 if mass > 3: print(mass, 'is large') ``` ```python= masses = [3.54, 2.07, 9.22, 1.71] for m in masses: if m > 3.0: print(m, 'is large') ``` ```python= # Example of using an 'else' condition as a 'catch all' masses = [3.54, 2.07, 9.22, 1.71] for m in masses: if m > 3: print(m, 'is large') else: print(m, 'is small') ``` ```python= # example of an 'if' condition with 'elif' alternative and 'else as the 'catch all' masses = [3.54, 2.07, 9.22, 1.71] for m in masses: if m > 9.0: print(m, 'is HUGE') elif m > 3.0: print(m, 'is Large') else: print(m, 'is small') ``` ```python= # Example of output from 'if' condition even when 'else' condition is applicable grade = 85 if grade >= 70: print('grade is C') elif grade >= 80: print('grade is B') elif grade >= 90: print('grade is A') ``` ```python= # reordering to get desired output grade = 85 if grade >= 90: print('grade is A') elif grade >= 80: print('grade is B') elif grade >= 70: print('grade is C') ``` ```python= # Example displaying a 'catch all' output when none of the conditions apply grade =65 if grade >= 90: print('grade is A') elif grade >= 80: print('grade is B') elif grade >= 70: print('grade is C') else: print('Failed') ``` ```python= # The condition output will not change even if the value tested is changed by the condition velocity = 10.0 if velocity > 20.0: print('moving too fast') else: print("adjusting velocity") velocity = 50.0 ``` ```python= # ouly using a loop that includes a change in the variable being tested changes the final output #you can also comment out the #print() in the loop to only display the final velocity velocity = 10.0 for i in range(5): print(i, ':', velocity) if velocity > 20.0: print("moving too fast") velocity = velocity - 5.0 else: print("moving too slow") velocity = velocity + 10 print('final velocity:', velocity) ``` ## Boolean operators ```python= masses = [3.54, 2.07, 9.22, 1.86, 1.71] velocity = [10.00, 20.00, 30.00, 40.00, 50.00] i = 0 for i in range(5): if (mass[i] > 5) and (velocity[i] > 20): print("Fast heavy object. Duck!") elif (mass[i] > 2) and (mass[i] <= 5) and (velocity[i] <= 20): print("Normal traffic") elif (mass[i] <= 2) and (velocity[i] <= 20): print("Slow light object. Ignore it.") else: print("Whoa! Check it") ``` `True and False = False` `True or False = True` `is` test if two objects are the *same object* `==` test if two objects are the same ```python= car1 = 'Sedan' car2 = 'Sedan' if car1 is car2: print('Both cars') # But not with string literals, this gives you a warning if 'Small' is 'Small': print('Yes') if car2 is not car3: print("different vehecles") ``` ```python= import pandas as pd # load in data files using a for loop for filename in ['gapminder_gdp_africa.csv', 'gapminder_gdp_asia.csv']: data = pd.read_csv(filename, index_col='country') print(filename, data.min()) ``` ```python= import glob print("all csv files in data directory:", glob.glob('*.csv')) # find all pdb files print('all pdb files:', glob.glob('*.pdb')) ``` ```python= # matches for all files in the format of # gapminder_[something].csv for filename in glob.glob('gapminder_*.csv'): data = pd.read_csv(filename) print(filename, data['gdpPercap_1952'].min()) ``` ## Function ```python= def print_greeting(): print("Hello") ``` ```python # call the function without passing an argument as an argument is not needed per this function definition def print_greeting(): print('Hello') print_greeting() ``` ```python= # a function that must take an argument in order to work def print_data(year, month, day): joined = str(year) + '/' + str(month) + '/' + str(day) print(joined) return joined print_date(1871, 3, 19) date = print_date(1871, 3, 19) ``` ```python= def average(values): if len(values) == 0: return None return sum(values) / len(values) a = average([1,3,4]) print('average of actual values:', a) print("Average of empty list:", average([])) ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.