Workshop Details
Dates: September 6th - 13th, 2022
Time: 9am - 12pm

Workshop Agenda:
https://ucsdlib.github.io/2022-09-06-carpentries-uc/

Workshop Lesson:
http://swcarpentry.github.io/python-novice-gapminder/

Day 1 - 3: Introduction to Python

Software Installation:
Anaconda
https://www.anaconda.com/download/

download latest version - 64-bit installer for Windows 10
This application is used to install and run Jupyter Notebooks
Google Collab: https://colab.research.google.com (for use if there are problems during the workshop)

Lesson Data (download)

NOTES:

A copy of the instructor live session notes will be made available to participants upon request at the end of the workshop.

Jupyterlab will be used for the lessons
[m] Markdown cell = notes
[#]also works in code cell for notes
[b] = add cell below [a] is above
[r]Raw cells cannot have text edits

(for Python lessons)
https://www.markdownguide.org/getting-started/
https://www.markdownguide.org/basic-syntax/

Workshop Day 1

First name and Last Name/Organization/Dept./Email |

Name (first & last)	Organization	Dept.	Email
(example) Jane Doe	UCSD	IT	jdoe1@ucsd.edu
Kat Koziar (Helper)	UCR	Library	katherine.koziar@ucr.edu
Jacob sola	UCR	Chemistry/Biomedical	jsola032@ucr.edu
Douglas Zhang	UCSD	Chemistry/Biochemistry	doz023@ucsd.edu
Jacqueline Giacoman	UC Merced	Political Science	jgiacoman@ucmerced.edu
Jose Hernandez	UCB	Library	jose1991@berkeley.edu
John Thompson	UC Merced	Molecular & Cellular Biology	jthompson44@ucmerced.edu
Derek Devnich	UC Merced
Sam Erickson	UC Merced	Physics	serickson3@ucmerced.edu
Dilawer Ali	UC Merced	Mechanical Engineering	dali4@ucmerced.edu
Igor Aprelev	UCSD	Mathematics and Economics	iaprelev@ucsd.edu
Benjamin Nauman	UCLA	Geography	bnauman@ucla.edu
Mohit Saraswat	UC Merced	Chemistry	msaraswat@ucmerced.edu
Jacob Ross	UCSD	Anesthesiology	jaross@ucsd.edu
Jay Colond	UCM	Sociology	jcolond@ucmerced.edu
Zhaoning (Johnny) Wang	UCSD	CMM	zhw063@health.ucsd.edu
Lillie Pennington	UC Merced	Life and Environmental Sciences	lpennington@ucmerced.edu
Christian Henry	UC Berkeley	Integrative Biology	chrishenry@berkeley.edu
Belina Chong	UCLA	Ecology and Evolutionary Biology	moonmoon394@ucla.edu
Josiah Piceno	UCM	MBSE	jpiceno3@ucmerced.edu
Jun Tan	UCSD	Economics	j4tn@ucsd.edu
Jon Dean	UCSD	Anesthesiology	j1dean@health.ucsd.edu
Tahirah Williams	UCM	QSB	twilliams76@ucmerced.edu
Liam de Villa Bourke	UCLA	Institute of the Environment and Sustainability	liamdevilla@g.ucla.edu
Rukmini Ravi	UCSD	San Diego Supercomputer Center	ruravi@ucsd.edu
Amber Heidbrink	UCSD	Cell and Developmental Biology	aheidbrink@ucsd.edu
Haley Potts	UCSD	Math & Economics	hpotts@ucsd.edu
isabella schaedle	UCSD	MMMMMMMM

Apisit Kaewsanit	UCSF	Epidemiology and Biostatistics	apisit.kaewsanit@ucsf.edu
Ivan Felix Rios	UCSD	Mathemathics & Economics	ifelixrios@ucsd.edu
Christian Corrales	UCLA	Neurology	ccorrales@mednet.ucla.edu
Michael Woller	UCLA	Psychology	michaelwoller@g.ucla.edu
Stella Yuan	UCLA	Ecology and Evolutionary Biology	scy8@g.ucla.edu
Jonathan Le	UCR	Mathematics	jle173@ucr.edu
Laika Aguinaldo	UCSD	Psychiatry	laaguinaldo@ucsd.edu
Chris Gray	UCR	Data Science	cgray024@ucr.edu
Ana Carolina Dantas Machado	UCSD	Medicine	adantasmachado@ucsd.edu
Jason Ngo	UC Merced	Bioengineering	jngo42@ucmerced.edu
Yibing Zhang	UC Merced	Bioengineering	yzhang291@ucmerced.edu
Ashwin Thomas	UC Merced	Environmental Systems	athomas59@ucmerced.edu
Eric Hyde	UCSD	Epidemiology	ehyde@health.ucsd.edu
Bineh Ndefru	UCLA	Materials Science	bndefru@ucla.edu

Vishakha Malhotra	UCSF	Biostatistics and Epidemiology
Bruce Hamilton	UCSD	School of Medicine	bah@ucsd.edu
Kazuma Nagatsuka	UCSD	Robotics(Mechanical Engineering)	kngatsuka@ucsd.edu
Caitlin Tribelhorn	UCSD	Pediatrics	ctribelh@ucsd.edu
Vikram Jambulapati	UCSD	Economics	vjambula@ucsd.edu
Simran Kanal	UCSF	Biostatistics and Epidemiology	simran.kanal@ucsf.edu
Daryl Han	UC Irvine	Student Center and Event Services	ddhan@uci.edu
Charles Faulhaber	UC Berkeley	Bancroft Library / Dept. of Spanish
Mario Cuaya	UCR	Computer Science	mcuay001@ucr.edu
Waleed Rajabally	UC Merced	Sociology	wrajabally@ucmerced.edu
Junxiao Gao	UCSF	Biostatistics and Epidemiology	Junxiao.Gao@ucsf.edu

Jay Chi	UCSB	ETS	jaychi@ucsb.edu

Vishakha Malhotra	UCSF	Biostatistics and Epidemiology	vishakha.malhotra@ucsf.edu

Day 1 Questions:

Please enter any questions not answered during live session here:
1.

Day 1 Live Class Notes:

Download link: https://www.anaconda.com/products/distribution
Working in Anaconda JupyterLab
GUI (middle-man, colloquially pronounced as "gooey") vs command-line
Today's workshop is strictly in JupyterLab GUI

Computer programming languages - there are a lot of them, and what they do is similar, syntax is also similar between different languages (although, each is specific). Able to learn the basics and apply them to different langauges.
Your favorite search engine is a good resource when you're looking for answers to your programming questions (kat's note: I <3 Stack Exchange)

working directory - in JupyterLab, working directory is shown on the left sidebar. Left sidebar also shows tabs, such as file browse (where you can select your working directory, create new files/folders), a list of what terminals are running, etc. The left sidebar can also be collapsed or expanded. Running anaconda JupyterLab is local to your computer, so when you're using a public computer, any files are saved on that public computer

new file - Day1_Python_LiveNotes.ipynb (to rename, right click on file to bring up submenu)

Interface - menu bar at top contains more options than the tabs in the left sidebar quicklinks

Command and Edit modes - press B will create a new cell below current cell

code cell will allow you to enter code
markdown cell doesn't run code, it's only notes (formatted in markdown) You can change a cell into a markdown cell by pressing m - switch between code and markdown cells by pressing the m or y keys. m y
print('Hello') will show 'Hello' right below the cell if it's executed in a code cell
a-key creates a cell above
ctrl-enter will run the cell, either execute the command in a code cell or render the markdown in a markdown cell
menu -> Kernel -> restart and clear all output will clear all output and saved variables, but keep the text in the cells.
Markdown cells are stylized text
Hello There
Bye
raw (code) cells are plain text, and executable codes the octothorp, pound sign, number sign, hash: # is used for comments in code
comments are used to explain why/what your code is doing - comments are a love note to your future self
to create a list in markdown, bullets are created using a - or * with a space. different levels are created using levels.
- example of level 2
  - level 3

Numbered lists

level1
2. level 2, also requires tabs

A tool like HackMD lets you practice markdown.

Bold and italics

bold is surrounded by two astrisks
italics is surrounded by single astrisk or underscores

In JupyterLab markdown cells, you can combine some html elements, such as <br>
backslash \ before the less-than-symbol will escape the character so it isn't read as html \<br>

Mixed list

level 1
- Level 2
level 1
- Level 2
level 1
- Level 2

Headings use # to create different sizes

largest

one smaller

smaller

etc

even smaller

Markdown Cheatsheet

Save and save often

Always shut down your kernel (menu -> Kernel -> Shut Down Kernel) when you're finished
- this makes sure your file/project isn't continuing to use resources when not intended - especially useful when you're using a hpcc environment.

Challenge #1

Lesson 2: Variables and Assignments

age = 42
first_name = 'Ahmed'

variable_name = value
computer only recognizes the values assigned to the variable after the code cell is executed
variable name rules
- can only contain letters, digits, underscores (a dash is a minus sign in code!)
- use underscore or camelCase to help human readability
- thisisaverylongnamethatishardforahumantoread = "Jimmy"
- this_is_more_readable = "Jimmy"
- thisIsCamelCase = "jimmy"
- variable names cannot start with a number
- use self-describing short variable names (x is not self-describing, age or weight are self-describing)
- variable names are CaseSensitive
- variables that start with an underscore have a special meaning (_dont_use_until_you_understand_what_it_means)
- will get syntax error if the variable name doesn't follow the rules, such as 3age (starts with a number) or read@one (uses any symbol other than the underscore _)

Built in functions

print() prints things as text
print(first_name, 'is', age, 'years old') will print Ahmed is 42 years old
built-in functions are native to python, and are functions that are commonly used by programers
print() will automatically add single spaces in the current version of Python.
print(argument1, argument2, argument3, argument4)
functions are self-contained - will take in arguments and provide output.
functions allow you to easily reuse code
not all functions require arguments. some functions require a certain number of arguments.

Variables

must be created before they are used.
print(myval) will give an error if myval isn't already created with a value

This will throw an error because last_name does not have an assigned value

print(last_name)
last_name = "Smith"

This will not throw an error

last_name = "Smith" 
print(last_name)

Challenge #2
Assign the variable named color1 to the value red and the variable named color2 to the value blue. Then print red is not blue using the variable names as input (or arguments)

color1 = 'red'
color2 = 'blue'
print(color1, 'is not', color2)
print(color1, 'is', 'not', color2)

Blocks of text

you can surround a block of text with triple quotes, like so: """ My very long block of text """

variables used in calculations

need to be a certain datatype for calculations - num type, integer or float
age = age + 3
3 + 5 * 4 calculates according to math rules (order of operations), not read left to right
- parentheses/brackets, exponents/radicals, muliplication/division, addition/subtraction
3 + 5 * 4 = 23
(3 + 5) * 4 = 32

Challenge #3
Write the code for for the following: number1 is 22, number2 is 5, and number3 is 100. Multiple number1 by number3 then divid by number2. The answer calculation answer should be number4. Finally, output 'The answer is number4' - with the value displaying rather than the variable.

Built-in functions

index() gives you a single character from a string
- in python, indices start with 0 (zero)

atom_name = 'helium'
print(atom_name[0])

output is h

index() uses the variable name, then square brackets around the number of the index you want to obtain

datatype strings are text surrounded by single or double-quotes (pair single-quotes with single-quotes, don't interchange 'like this")

id_number = 2587464
print(id_number[2])

will result in error because id_number is an integer, and not a string

list

my_list = ['apple', 'pear', 'peach]'
print(my_list[1])

output is pear

slices

slice is a substring or subset
slice is variable[start position: stop position(not including)]

# string example
atom_name = 'sodium'
print(atom_name[0:3])

output is sod

**# list example
many_atoms = ['oxygen', 'carbon', 'nitrogen', 'neon', 'iron', 'zinc']
print(many_atoms[1:4])**

output will be ['carbon', 'nitrogen', 'neon'] (notice how it outputs in a list format!)

how long are things?

function is len()
finds the length of a string or list
lets you know how long a string is, or how many elements are in a list

#string example
print(len('helium'))

output is 6 (counts number of characters)

# list example
my_list2 = ['a', '1', '43', 'dream', 'please']
print(len(my_list2))

output is 5 (counts number of elements in list)

Challenge #4

what does thing[:] (just a colon) do?
What does thing[number:some-negative-number do?
What does the following program print?

atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])

Solution #4

returns everything
returns a slice from number to the the negative count from the end of the variable

#example
atom_name = 'carbon'
print(atom_name[1:-4])

output is ca

output is atom_name[1:3] is: ar
- (remember, the number that is the stop position in the slice isn't included.)

Data types & type conversion

all data that python reads is associated with a data type. Types we've covered so far are string, integer, floats, which are the three commonly used data types.
Type conversion means you're converting data from one type to another
integers : whole numbers
- type conversion use int()
floats : also called floating points, they are decimal (real) numbers
- type conversion use float()
strings : sequence of characters, written inside quotes
- type conversion use float()
to identify the type of data, use type()

type(52) will output int

print(type(52)) will output <class 'int'>

fitness = 'average'
print(type(fitness))

output is <class 'str'>

print(type(hair)) will throw an error, because Python is reading hair as a variable name, which isn't defined.

print(type(3.4))

output is <class 'float'>

print (5-2)

will output 2

print ('hello'-'h')

will throw an error because you can't subtract strings

You can use '+' and '*' on integers, floats, and strings, but operates differently on strings

print (4+5)

output is 9

print ("Ahmed"+"Walch")

output is AhmedWalch

print ('Ahmed'*10)

output is AhmedAhmedAhmedAhmedAhmedAhmedAhmedAhmedAhmedAhmed

Cannot mix strings with integers/floats for mathematical purposes

print (1 + '2')

will throw an error.

however,

print (1 + int('2'))

will output 3 because '2' is type cast as an integer, allowing math operations.

print (str(1) + '2')

will output 12 (which is actually a string, not a number!)

print ('Gene'+str(23455685))

will output Gene23455685, which allows easy labels!

Variables only change values once the value is (re-)assigned

if you need to keep an original value of a variable, create a new variable name, otherwise you're overwriting the original value.

LIVE LESSON NOTES: https://drive.google.com/file/d/1TSm1bA55RwQu5-iqdnBNRU47U3os9x86/view?usp=sharing

End Day 1

Workshop Day 2

First name and Last Name/Organization/Dept./Email

Name (first & last)	Organization	Dept.	Email
Geno Sanchez (helper)	UCLA	Library	genosanchez@library.ucla.edu
Amber Heidbrink	UCSD	Cell and Developmental Biology	aheidbrink@ucsd.edu
Kat Koziar	UCR	Library	katherine.koziar@ucr.edu
Yibing Zhang	UCM	Bioengineering	yzhang291@ucmerced.edu
Douglas Zhang	UCSD	Chemistry and Biochemistry	doz023@ucsd.edu
Kazuma Nagatsuka	UCSD	Robotics(Mechanical Engineering)	knagatsuka@ucsd.edu
Jay Colond	UCM	Sociology	jcolond@ucmerced.edu
Belina Chong	UCLA	Ecology and Evolutionary Biology	moonmoon394@ucla.edu
Jonathan Le	UCR	Mathematics	jle173@ucr.edu
Caitlin Tribelhorn	UCSD	Pediatrics	ctribelh@ucsd.edu
Igor Aprelev	UCSD	Mathematics and Econonmics	iaprelev@ucsd.edu
Sam Erickson	UC Merced	Physics	serickson3@ucmerced.edu
Jay Chi	UCSB	ETS	jaychi@ucsb.edu

Apisit Kaewsanit	UCSF	Epidemiology and Biostatistics	apisit.kaewsanit@ucsf.edu
Benjamin Nauman	UCLA	Geography	bnauman@ucla.edu
Suzanne Paulson	UCLA	AOS	paulson@atmos.ucla.edu
Liam de Villa Bourke	UCLA	IOES	liamdevilla@g.ucla.edu
Mario Cuaya	UCR	Computer Science	mcuay001@ucr.edu
Josiah Piceno	UCM	MBSE	jpiceno3@ucmerced.edu
John Thompson	UC Merced	Cell & Molecular Biology	jthompson44@ucmerced.edu
Bineh Ndefru	UCLA	Material Science	bndefru@ucla.edu
Zhiyuan Yao	UCLA	Data Science Center	zyao@ucla.edu
Tahirah Williams	UCM	QSB	twilliams76@ucmerced.edu
Haley Potts	UCSD	Math & Econ	hpotts@ucsd.edu
Zhaoning (Johnny) Wang	UCSD	CMM	zhw063@health.ucsd.edu
Daryl Han	UC Irvine	Student Center and Event Services	ddhan@uci.edu
Simran Kanal	UCSF	Epidemiology and Biostatistics	simran.kanal@ucsf.edu
Jon Dean	UCSD	Anesthesiology	j1dean@health.ucsd.edu
Junxiao Gao	UCSF	Biostatistics and Epidemiology	Junxiao.Gao@ucsf.edu
Stella Yuan	UCLA	Ecology and Evolutionary Biology	scy8@g.ucla.edu
Waleed Rajabally	UC Merced	Sociology	wrajabally@ucmerced.edu

Jun Tan	UCSD	Economics	j4tan@ucsd.edu
Christian Henry	UC Berkeley	Integrative Biology	chrishenry@berkeley.edu
Jacob Ross	UCSD	Anesthesiology	jaross@ucsd.edu
Christopher Gray	UCR	Computer Science	cgray024@ucr.edu

Day 2 Questions:

Please enter any questions not answered during live session here:
1.

Day 2 Live Class Notes:

Gapminder data download: http://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip

Lesson 5 Libraries

Most of the power of a programming language is in its libraries.

A library is a collection of files (called modules) that contains functions for use by other programs.

May also contain data values
Pandas - widely used library often used in the science world
Many are open source
The Python standard library is an extensive suite of modules that comes with Python itself.
- https://docs.python.org/3/library/

A program must import a library module before using it.

Use import to load a library module into a program’s memory.

import math
print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))

pi is 3.141592653589793 cos(pi) is -1.0

Have to refer to each item with the module’s name.

Use help to learn about the contents of a library module.

help(math)
Help on module math:

NAME
    math

MODULE REFERENCE
    http://docs.python.org/3/library/math
...

Import specific items from a library module to shorten programs.

from math import cos, pi

print('cos(pi) is', cos(pi))

cos(pi) is -1.0

Create an alias for a library module when importing it to shorten programs.

import math as m

print('cos(pi) is', m.cos(m.pi))

cos(pi) is -1.0

Use import … as … to give a library a short alias while importing it.
Then refer to items in the library using that shortened name

import matplotlib as mpl

Challenge

Fill in the blanks so that the program below prints 90.0.
Rewrite the program so that it uses import without as.
Which form do you find easier to read?

import math as m
angle = ____.degrees(____.pi / 2)
print(____)

Solution:









import math as m
#1
angle = m.degrees(m.pi / 2)
print(angle)

#2
import math
angle = math.degrees(math.pi / 2)
print(angle)

90.0

Lesson 6: Writing Functions

Define a function using def with a name, parameters, and a block of code.

# you need to declare a new function with the keyword 'def'.
# you need to include a 'name()'.
def say_hello():
    print("hello!")

Begin the definition of a new function with def
Followed by the name of the function.
- Must obey the same rules as variable names
- You need to use a letter or underscore or number, but you can not start with a number.
Then parameters in parentheses
- Empty parenteses if the function doesn't take any input
Then a colon is used
Next line of code is indented
Some functions require an argument to be passed for it to be execute and others do not.

# After defining a function, you must 'call' a function to execute it.

say_hello()

hello!

# Let's make a function that prints a date as an example of a function that takes an argument.

def print_date(year, month, day): # so the input is 'arg1', arg2,arg3' being required for the function
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)
    
print_date(2022, 1, 2)

2022/1/2

print_date(month = 1, year = 2019, day = 23)

2019/1/23

Defining a function using the `return` call.

def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)

avg = average([1,3,4])

print(avg)

emptyAvg = avg([])

print(emptyAvg)

2.6666666666666665
None

# 
result = print_date(1871, 3, 19)
print('result of print_date', result)

1871/3/19 result of print_date None

Challenge

What is wrong with this example?

#Example
result = print_time(11,37,59)

def print_time(hour, minute, second):
    time_string = str(hour) + ':' + str(minute)+ ':' + str(second)
    print(time_string)

# After fix:
 result = print_time(11, 37, 59)
 print('result of call is:', result)

11:37:59 result of call is: None

Reading tabular data into data frames

import os

#Get our current working directory
print(os.getcwd())

#List the contents of this directory
print(os.listdir())

import pandas as pd

data = pd.read_csv("gapminder_gdp_oceania.csv")

#Reading data from a subfolder
#data = pd.read_csv("subfolder/gapminder_gdp_oceania.csv")

print(data)

       country  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
0    Australia     10039.59564     10949.64959     12217.22686   
1  New Zealand     10556.57566     12247.39532     13175.67800   

   gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
0     14526.12465     16788.62948     18334.19751     19477.00928   
1     14463.91893     16046.03728     16233.71770     17632.41040   

   gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
0     21888.88903     23424.76683     26997.93657     30687.75473   
1     19007.19129     18363.32494     21050.41377     23189.80135   

   gdpPercap_2007  
0     34435.36744  
1     25185.00911

data

        country 	gdpPercap_1952 	gdpPercap_1957 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972 	gdpPercap_1977 	gdpPercap_1982 	gdpPercap_1987 	gdpPercap_1992 	gdpPercap_1997 	gdpPercap_2002 	gdpPercap_2007
0 	Australia 	10039.59564 	10949.64959 	12217.22686 	14526.12465 	16788.62948 	18334.19751 	19477.00928 	21888.88903 	23424.76683 	26997.93657 	30687.75473 	34435.36744
1 	New Zealand 	10556.57566 	12247.39532 	13175.67800 	14463.91893 	16046.03728 	16233.71770 	17632.41040 	19007.19129 	18363.32494 	21050.41377 	23189.80135 	25185.00911

# lets identify our rows by country not index number

data = pd.read_csv("gapminder_gdp_oceania.csv", index_col = "country")

 	        gdpPercap_1952 	gdpPercap_1957 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972 	gdpPercap_1977 	gdpPercap_1982 	gdpPercap_1987 	gdpPercap_1992 	gdpPercap_1997 	gdpPercap_2002 	gdpPercap_2007
country 												
Australia 	10039.59564 	10949.64959 	12217.22686 	14526.12465 	16788.62948 	18334.19751 	19477.00928 	21888.88903 	23424.76683 	26997.93657 	30687.75473 	34435.36744
New Zealand 	10556.57566 	12247.39532 	13175.67800 	14463.91893 	16046.03728 	16233.71770 	17632.41040 	19007.19129 	18363.32494 	21050.41377 	23189.80135 	25185.00911

data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, Australia to New Zealand
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   gdpPercap_1952  2 non-null      float64
 1   gdpPercap_1957  2 non-null      float64
 2   gdpPercap_1962  2 non-null      float64
 3   gdpPercap_1967  2 non-null      float64
 4   gdpPercap_1972  2 non-null      float64
 5   gdpPercap_1977  2 non-null      float64
 6   gdpPercap_1982  2 non-null      float64
 7   gdpPercap_1987  2 non-null      float64
 8   gdpPercap_1992  2 non-null      float64
 9   gdpPercap_1997  2 non-null      float64
 10  gdpPercap_2002  2 non-null      float64
 11  gdpPercap_2007  2 non-null      float64
dtypes: float64(12)
memory usage: 208.0+ bytes

stat info of your data

data.describe()

 	gdpPercap_1952 	gdpPercap_1957 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972 	gdpPercap_1977 	gdpPercap_1982 	gdpPercap_1987 	gdpPercap_1992 	gdpPercap_1997 	gdpPercap_2002 	gdpPercap_2007
count 	2.000000 	2.000000 	2.000000 	2.000000 	2.00000 	2.000000 	2.000000 	2.000000 	2.000000 	2.000000 	2.000000 	2.000000
mean 	10298.085650 	11598.522455 	12696.452430 	14495.021790 	16417.33338 	17283.957605 	18554.709840 	20448.040160 	20894.045885 	24024.175170 	26938.778040 	29810.188275
std 	365.560078 	917.644806 	677.727301 	43.986086 	525.09198 	1485.263517 	1304.328377 	2037.668013 	3578.979883 	4205.533703 	5301.853680 	6540.991104
min 	10039.595640 	10949.649590 	12217.226860 	14463.918930 	16046.03728 	16233.717700 	17632.410400 	19007.191290 	18363.324940 	21050.413770 	23189.801350 	25185.009110
25% 	10168.840645 	11274.086022 	12456.839645 	14479.470360 	16231.68533 	16758.837652 	18093.560120 	19727.615725 	19628.685412 	22537.294470 	25064.289695 	27497.598692
50% 	10298.085650 	11598.522455 	12696.452430 	14495.021790 	16417.33338 	17283.957605 	18554.709840 	20448.040160 	20894.045885 	24024.175170 	26938.778040 	29810.188275
75% 	10427.330655 	11922.958888 	12936.065215 	14510.573220 	16602.98143 	17809.077558 	19015.859560 	21168.464595 	22159.406358 	25511.055870 	28813.266385 	32122.777858
max 	10556.575660 	12247.395320 	13175.678000 	14526.124650 	16788.62948 	18334.197510 	19477.009280 	21888.889030 	23424.766830 	26997.936570 	30687.754730 	34435.367440

Print column names

data.columns
# or
print(data.columns)

Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
      dtype='object')

Dataframes

Dataframes are a collection of columns. Within a column it has to be the same data type (e.g. float, int, str)

Challenge

Read the data in gapminder_gdp_americas.csv into a variable called americas and display its summary statistics.
After reading the data for the Americas, use help(americas.head) and help(americas.head) to find out what DataFrame.head and DataFrame.tail do.
How can you display the first three rows of this data?

solution:

americas = pd.read_csv("data/gapminder_gdp_americas.csv", index_col = "country")

print(americas.head(3))

print(americas.describe())

          continent  gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  \
country                                                               
Argentina  Americas     5911.315053     6856.856212     7133.166023   
Bolivia    Americas     2677.326347     2127.686326     2180.972546   
Brazil     Americas     2108.944355     2487.365989     3336.585802   

           gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  \
country                                                                     
Argentina     8052.953021     9443.038526    10079.026740     8997.897412   
Bolivia       2586.886053     2980.331339     3548.097832     3156.510452   
Brazil        3429.864357     4985.711467     6660.118654     7030.835878   

           gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  \
country                                                                     
Argentina     9139.671389     9308.418710    10967.281950     8797.640716   
Bolivia       2753.691490     2961.699694     3326.143191     3413.262690   
Brazil        7807.095818     6950.283021     7957.980824     8131.212843   

           gdpPercap_2007  
country                    
Argentina    12779.379640  
Bolivia       3822.137084  
Brazil        9065.800825 




       gdpPercap_1952  gdpPercap_1957  gdpPercap_1962  gdpPercap_1967  \
count       25.000000       25.000000       25.000000       25.000000   
mean      4079.062552     4616.043733     4901.541870     5668.253496   
std       3001.727522     3312.381083     3421.740569     4160.885560   
min       1397.717137     1544.402995     1662.137359     1452.057666   
25%       2428.237769     2487.365989     2750.364446     3242.531147   
50%       3048.302900     3780.546651     4086.114078     4643.393534   
75%       3939.978789     4756.525781     5180.755910     5788.093330   
max      13990.482080    14847.127120    16173.145860    19530.365570   

       gdpPercap_1972  gdpPercap_1977  gdpPercap_1982  gdpPercap_1987  \
count       25.000000       25.000000       25.000000       25.000000   
mean      6491.334139     7352.007126     7506.737088     7793.400261   
std       4754.404329     5355.602518     5530.490471     6665.039509   
min       1654.456946     1874.298931     2011.159549     1823.015995   
25%       4031.408271     4756.763836     4258.503604     4140.442097   
50%       5305.445256     6281.290855     6434.501797     6360.943444   
75%       6809.406690     7674.929108     8997.897412     7807.095818   
max      21806.035940    24072.632130    25009.559140    29884.350410   

       gdpPercap_1992  gdpPercap_1997  gdpPercap_2002  gdpPercap_2007  
count       25.000000       25.000000       25.000000       25.000000  
mean      8044.934406     8889.300863     9287.677107    11003.031625  
std       7047.089191     7874.225145     8895.817785     9713.209302  
min       1456.309517     1341.726931     1270.364932     1201.637154  
25%       4439.450840     4684.313807     4858.347495     5728.353514  
50%       6618.743050     7113.692252     6994.774861     8948.102923  
75%       8137.004775     9767.297530     8797.640716    11977.574960  
max      32003.932240    35767.433030    39097.099550    42951.653090

Getting data out of your data frame

# get a column
data = pd.read_csv("gapminder_gdp_europe.csv", index_col = "country")
data.columns

Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
      dtype='object')

col1 = data["gdpPercap_1957"] # getting data by columnn label

print(col1)

country
Albania                    1942.284244
Austria                    8842.598030
Belgium                    9714.960623
Bosnia and Herzegovina     1353.989176
Bulgaria                   3008.670727
Croatia                    4338.231617
Czech Republic             8256.343918
Denmark                   11099.659350
Finland                    7545.415386
France                     8662.834898
Germany                   10187.826650
Greece                     4916.299889
Hungary                    6040.180011
Iceland                    9244.001412
Ireland                    5599.077872
Italy                      6248.656232
Montenegro                 3682.259903
Netherlands               11276.193440
Norway                    11653.973040
Poland                     4734.253019
Portugal                   3774.571743
Romania                    3943.370225
Serbia                     4981.090891
Slovak Republic            6093.262980
Slovenia                   5862.276629
Spain                      4564.802410
Sweden                     9911.878226
Switzerland               17909.489730
Turkey                     2218.754257
United Kingdom            11283.177950
Name: gdpPercap_1957, dtype: float64

# Pandas introduces new data types

print(type(data))

<class 'pandas.core.frame.DataFrame'>

print(type(col1))


<class 'pandas.core.series.Series'>

Get data subsets by position

subset1 = data.iloc[0, 0]
print(subset1)

1601.056136

Get data subsets by label

subset2 = data.loc["Albania", "gdpPercap_1952"]
print(subset2)

1601.056136

Get row by label

data.loc["Albania",:]


gdpPercap_1952    1601.056136
gdpPercap_1957    1942.284244
gdpPercap_1962    2312.888958
gdpPercap_1967    2760.196931
gdpPercap_1972    3313.422188
gdpPercap_1977    3533.003910
gdpPercap_1982    3630.880722
gdpPercap_1987    3738.932735
gdpPercap_1992    2497.437901
gdpPercap_1997    3193.054604
gdpPercap_2002    4604.211737
gdpPercap_2007    5937.029526
Name: Albania, dtype: float64

country_subset = data.loc["Italy":"Poland", "gdpPercap_1962":"gdpPercap_1972"]

country_subset


	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	8243.582340 	10022.401310 	12269.273780
Montenegro 	4649.593785 	5907.850937 	7778.414017
Netherlands 	12790.849560 	15363.251360 	18794.745670
Norway 	13450.401510 	16361.876470 	18965.055510
Poland 	5338.752143 	6557.152776 	8006.506993

print(type(country_subset))
print(country_subset.describe())


<class 'pandas.core.frame.DataFrame'>
       gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
count        5.000000        5.000000        5.000000
mean      8894.635868    10842.506571    13162.799194
std       4093.410673     4855.106424     5517.298708
min       4649.593785     5907.850937     7778.414017
25%       5338.752143     6557.152776     8006.506993
50%       8243.582340    10022.401310    12269.273780
75%      12790.849560    15363.251360    18794.745670
max      13450.401510    16361.876470    18965.055510

# Gives you dataframes for 2 specific countries in your data

data.loc[["Italy","Poland"], :]


 	gdpPercap_1952 	gdpPercap_1957 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972 	gdpPercap_1977 	gdpPercap_1982 	gdpPercap_1987 	gdpPercap_1992 	gdpPercap_1997 	gdpPercap_2002 	gdpPercap_2007
country 												
Italy 	4931.404155 	6248.656232 	8243.582340 	10022.401310 	12269.273780 	14255.984750 	16537.483500 	19207.234820 	22013.644860 	24675.02446 	27968.09817 	28569.71970
Poland 	4029.329699 	4734.253019 	5338.752143 	6557.152776 	8006.506993 	9508.141454 	8451.531004 	9082.351172 	7738.881247 	10159.58368 	12002.23908 	15389.92468

alt solution:

italy = data.loc["Italy", "gdpPercap_1952":"gdpPercap_1962"]
poland = data.loc["Poland", "gdpPercap_1952":"gdpPercap_1962"]
pd.concat([italy, poland])


gdpPercap_1952    4931.404155
gdpPercap_1957    6248.656232
gdpPercap_1962    8243.582340
gdpPercap_1952    4029.329699
gdpPercap_1957    4734.253019
gdpPercap_1962    5338.752143
dtype: float64

data.iloc[0:2, 0:2]


 	        gdpPercap_1952 	gdpPercap_1957
country 		
Albania 	1601.056136 	1942.284244
Austria 	6137.076492 	8842.598030

Filter data

#Filtering data by a criterion

country_subset


 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	8243.582340 	10022.401310 	12269.273780
Montenegro 	4649.593785 	5907.850937 	7778.414017
Netherlands 	12790.849560 	15363.251360 	18794.745670
Norway 	13450.401510 	16361.876470 	18965.055510
Poland 	5338.752143 	6557.152776 	8006.506993

country_subset > 10000


	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	False 	True 	True
Montenegro 	False 	False 	False
Netherlands 	True 	True 	True
Norway 	True 	True 	True
Poland 	False 	False 	False

country_subset[country_subset > 10000]


 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	NaN 	10022.40131 	12269.27378
Montenegro 	NaN 	NaN 	NaN
Netherlands 	12790.84956 	15363.25136 	18794.74567
Norway 	13450.40151 	16361.87647 	18965.05551
Poland 	NaN 	NaN 	NaN

# Using the where() method for filtering

country_subset.where(country_subset > 10000)


 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	NaN 	10022.40131 	12269.27378
Montenegro 	NaN 	NaN 	NaN
Netherlands 	12790.84956 	15363.25136 	18794.74567
Norway 	13450.40151 	16361.87647 	18965.05551
Poland 	NaN 	NaN 	NaN

# Method chaining

country_subset.where(country_subset > 10000).describe()


 	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
count 	2.000000 	3.000000 	3.000000
mean 	13120.625535 	13915.843047 	16676.358320
std 	466.373656 	3408.589070 	3817.597015
min 	12790.849560 	10022.401310 	12269.273780
25% 	12955.737548 	12692.826335 	15532.009725
50% 	13120.625535 	15363.251360 	18794.745670
75% 	13285.513522 	15862.563915 	18879.900590
max 	13450.401510 	16361.876470 	18965.055510

country_subset.rank()


 	       gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	    3.0 	3.0 	3.0
Montenegro 	1.0 	1.0 	1.0
Netherlands 4.0 	4.0 	4.0
Norway    	5.0 	5.0 	5.0
Poland  	2.0 	2.0 	2.0

# An elaborate chaining example
country_subset.rank().corr("kendall")

country_subset.to_csv("country_subset.csv")

country_subset

	gdpPercap_1962 	gdpPercap_1967 	gdpPercap_1972
country 			
Italy 	8243.582340 	10022.401310 	12269.273780
Montenegro 	4649.593785 	5907.850937 	7778.414017
Netherlands 	12790.849560 	15363.251360 	18794.745670
Norway 	13450.401510 	16361.876470 	18965.055510
Poland 	5338.752143 	6557.152776 	8006.506993

LIVE LESSON NOTES:
Day 2 live notes A
Day 2 live notes B

End Day 2

Workshop Day 3

First name and Last Name/Organization/Dept./Email

Name (first & last)	Organization	Dept.	Email
Zhiyuan Yao	UCLA	Data Science Center	zyao@ucla.edu
Mario Cuaya	UCR	Computer Science	mcuay001@ucr.edu
Amber Heidbrink	UCSD	Cell and Developmental Biology	aheidbrink@ucsd.edu
Douglas Zhang	UCSD	Chemistry and Biochemistry	doz023@ucsd.edu
Stella Yuan	UCLA	Ecology and Evolutionary Biology	scy8@ucla.edu
Benjamin Nauman	UCLA	Geography	bnauman@ucla.edu
Belina Chong	UCLA	Ecology and Evolutionary Biology	moonmoon394@ucla.edu
Haley Potts	UCSD	Math & Economics	hpotts@ucsd.edu
Igor Aprelev	UCSD	Mathematics and Economics	iaprelev@ucsd.edu
Jun Tan	UCSD	Economics	j4tan@ucsd.edu
Jonathan Le	UCR	Mathematics	jle173@ucr.edu
Bineh Ndefru	UCLA	Materials Science	bndefru@ucla.edu
Jay Chi	UCSB	ETS	jaychi@ucsb.edu
Kazuma Nagatsuka	UCSD	Robotics(Mechanical Engineering)	knagatsuka@ucsd.edu
Josiah Piceno	UCM	MBSE	jpiceno3@ucmerced.edu
Yibing Zhang	UCM	Bioengineering	yzhang291@ucmerced.edu
Simran Kanal	UCSF	Epidemiology and Biostatistics	simran.kanal@ucsf.edu
Dilawer Ali	UC Merced	Mechanical Engineering	dali4@ucmerced.edu
Tahirah Williams	UCM	QSB	twilliams76@gmail.com
Christian Henry	UC Berkeley	UC Berkeley	chrishenry@berkeley.edu
Zhaoning (Johnny) Wang	UCSD	CMM	zhw063@health.ucsd.edu
Daryl Han	UC Irvine	Student Center and Event Services	ddhan@uci.edu
Jacob Ross	UCSD	Anesthesiology	jaross@ucsd.edu
Jay Colond	UCM	Sociology	jcolond@ucmerced.edu
John Thompson	UC Merced	Molecular & Cellular Biology	jthompson44@ucmerced.edu
Apisit Kaewsanit	UCSF	Epidemiology and Biostatistics	apisit.kaewsanit@ucsf.edu
Caitlin Tribelhorn	UCSD	Pediatrics	ctribelh@ucsd.edu
Waleed Rajabally	UCM	Sociology	wrajabally@ucmerced.edu
Junxiao Gao	UCSF	Epidemiology and Biostatistics	Junxiao.Gao@ucsf.edu
Sam Erickson	UC Merced	Physics	serickson3@ucmerced.edu
Christopher Gray	UCR	Computer Science	cgray024@ucr.edu

Day 3 Questions:

Please enter any questions not answered during live session here:
1.

Day 3 Live Class Notes:













# Day 3 Lists

# brackets[] 
# can have different data types
# it is mutable - character string is not mutable
# you can extend/append a slist to make it longer

pressure = [0.6, 0.7, 0.8, 0.9]
print(pressure)

#output
[0.6, 0.7, 0.8, 0.9]






list_a = ['a', 'b', 4, 6.7]
print(list_a)

#output
['a', 'b', 4, 6.7]




#array
import numpy as np

a = np.array




len(list_a)

#output
4




list_a[1]

#output
'b'





pressure

#output
[0.6, 0.7, 0.8, 0.9]







# assign a new value to a list
pressure[3] = 5
pressure

#output
[0.6, 0.7, 0.8, 5]









# extend or append new values to make a list longer
a = [1,2,3,4]
b = [5,6,7,8,9]
a.append(b)
print(a)

#output
[1, 2, 3, 4, [5, 6, 7, 8, 9]]







a[4][1]

#output
6







a = [1,2,3,4]
a.append(8)
print(a)

#output
[1, 2, 3, 4, 8]










# extend
a = [1,2,3,4]
b = [5,6,7,8,9]
a.extend(b)
print(a)


#output
[1, 2, 3, 4, 5, 6, 7, 8, 9]






list_empty = []
print(list_empty)

#output
[]








# character string in immutable

string_list = 'address'
string_list[3]

#output
'r'









string_list[3] = 'o'

#output
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-0babce904faa> in <module>
----> 1 string_list[3] = 'o'

TypeError: 'str' object does not support item assignment









# But if you convert the string to a list, you can use index to change the list

string_list = list('address')
string_list[3] = 'o'
print(string_list)

#output
['a', 'd', 'd', 'o', 'e', 's', 's']









# from a string to a list and back

string_a = 'gold'
string_list = list('gold')
print(string_list)

#output
['g', 'o', 'l', 'd']





string_list[0]

#output
'g'







#convert a list to a string using string.join()

string_list
print(''.join(string_list))

#output
gold








# Stepping through a list

string_list = list('address')
print(string_list)

#output
['a', 'd', 'd', 'r', 'e', 's', 's']

# the double colon means I want to look through each value in the list
string_list[::1]  

#output
# putting 2 instead of 1 means to look through every other (or 2nd) value
string_list[::2]

#output
['a', 'd', 'e', 's']






# putting 2 at the beginning only omits the first two index
string_list[2::]

#output 
['d', 'r', 'e', 's', 's']








# Difference between sort and sorted using a list

string_list = list('gold')
result = sorted(string_list)
print(result)
# the output is sorted in alphabetical order
['d', 'g', 'l', 'o']




print(string_list)

#output
['g', 'o', 'l', 'd']








string_list = list('gold')
result = string_list.sort()
print(result)
print(string_list)

#output
None
['d', 'g', 'l', 'o']










# Use sorted(variable) to assign to a new variable; thereby creating a new list
list_num = [10,2,5,7,8,4]
result_num = sorted(list_num)
print(result_num)
print(list_num)

#output
[2, 4, 5, 7, 8, 10]
[10, 2, 5, 7, 8, 4]








list_num = [10,2,5,7,8,4]
result_num = list_num.sort()
print(result_num)
print(list_num)
#output
None
[2, 4, 5, 7, 8, 10]








# Use variable.sort() as a function acting on the list to sort the list in place 
# This changes the list itself

list_num.sort()
print(list_num)

#output
[2, 4, 5, 7, 8, 10]


## Lesson: Plotting
import matplotlib. pyplot as plt












time = [1, 2, 3, 4]
position = [100, 200, 300, 400]

plt.plot(time,position, label = 'Position changes during time')
plt.xlabel('Time')
plt.ylabel('Position')
plt.legend()
plt.title('Position changes during time')

#output
Text(0.5, 1.0, 'Position changes during time')
#graph




















# Plot directly from a dataframe
import pandas as pd

# import the data and save as a dataframe
data_oceania = pd.read_csv('gapminder_gdp_oceania.csv', index_col = 'country')

# Let's remove part of the columns name to only use the year
data_oceania.columns = data_oceania.columns.str.strip('gdpPercap_')

# Now let's make sure the year is an integer by converting it
data_oceania.columns.astype(int)
print(data_oceania.columns) # this data in the columns of the dataframe
print(data_oceania.index) # this data entry associated with each column

#output
Index(['1952', '1957', '1962', '1967', '1972', '1977', '1982', '1987', '1992',
       '1997', '2002', '2007'],
      dtype='object')
Index(['Australia', 'New Zealand'], dtype='object', name='country')






# This plot doesn't make much sense

data_oceania.plot()
#output
<AxesSubplot:xlabel='country'>
#graph has several unreadable lines

















# Use transpose 'T' to switch the variable axis so the country is on y axis

data_oceania.T.plot()
plt.ylabel('GDP Per Capita') # here we added a y axis label
plt.xticks(rotation = 90) # here we rotated the x axis labels

#output
(array([-2.,  0.,  2.,  4.,  6.,  8., 10., 12.]),
 [Text(-2.0, 0, '2002'),
  Text(0.0, 0, '1952'),
  Text(2.0, 0, '1962'),
  Text(4.0, 0, '1972'),
  Text(6.0, 0, '1982'),
  Text(8.0, 0, '1992'),
  Text(10.0, 0, '2002'),
  Text(12.0, 0, '')])
# graph only has two lines for each country







# Using different plot styles with ggplot

plt.style.use('ggplot')
data_oceania.T.plot()

#output 
# graph










plt.style.use('seaborn')

# Let's plot one country against the other country
# s changes the size
# c changes the color
# m changes the type of marker
data_oceania.T.plot(kind = 'scatter', x = 'New Zealand', y = 'Australia', s = 60, c = 'orange', marker = '3')
#output
# graph

Challenges

Challeges #1

Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. Modify it again to plot the maximum GDP per capita over time for Europe.

data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.____.plot(label='min')
data_europe.____
plt.legend(loc='best')
plt.xticks(rotation=90)

Challenge #1 solution

data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.min().plot(label='min')
data_europe.max().plot(label='max')
plt.legend(loc='best')
plt.xticks(rotation=90)

Challenge #2

Fill in the blanks so that the program below produces the output shown.

values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)

# output 
first time: [1, 3, 5]
second time: [3, 5]

Challenge #2 solution

values = []
values.append(1)
values.append(3)
values.append(5)
print('first time:', values)
values = values[1:]
print('second time:', values)

Challenge #3

Fill in the blanks in each of the programs below to produce the indicated result.

# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)

Challenge #3 solution

total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)

Challenge #4

Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
    if ____:
        result.append(0)
    else:
        ____
print(result)
# output 
[0, 1, 1, 1, 0, 1]

Challenge #4 solution

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
    if value < 0.0:
        result.append(0)
    else:
        result.append(1)
print(result)

LIVE Session Notes: https://drive.google.com/file/d/1y8A0xUEWSdSrAhS9Sbvx39Etb1Vnn4rM/view?usp=sharing

Day 1 - 3: Introduction to Python

NOTES:

Workshop Day 1

First name and Last Name/Organization/Dept./Email |

Day 1 Questions:

Day 1 Live Class Notes:

Hello There

largest

one smaller

smaller

etc

even smaller

Save and save often

Lesson 2: Variables and Assignments

Data types & type conversion

Variables only change values once the value is (re-)assigned

End Day 1

Workshop Day 2

First name and Last Name/Organization/Dept./Email

Day 2 Questions:

Day 2 Live Class Notes:

Lesson 5 Libraries

Most of the power of a programming language is in its libraries.

A program must import a library module before using it.

Have to refer to each item with the module’s name.

Use help to learn about the contents of a library module.

Import specific items from a library module to shorten programs.

Create an alias for a library module when importing it to shorten programs.

Challenge

Lesson 6: Writing Functions

Define a function using def with a name, parameters, and a block of code.

Defining a function using the return call.

Challenge

What is wrong with this example?

Reading tabular data into data frames

stat info of your data

Print column names

Dataframes

Challenge

Getting data out of your data frame

Get data subsets by position

Get data subsets by label

Get row by label

Filter data

End Day 2

Workshop Day 3

First name and Last Name/Organization/Dept./Email

Day 3 Questions:

Day 3 Live Class Notes:

Challenges

Challeges #1

Challenge #1 solution

Challenge #2

Challenge #2 solution

Challenge #3

Challenge #3 solution

Challenge #4

Challenge #4 solution

End Day 3

Read more

UC Carpentries Workshop (Python)

2022 UC Carpentries Fall Workshop (Version Control w/ Git)

2023 UC Carpentries Fall Workshop (All Notes)

Day 2 - 2023 UC Carpentries Fall Workshop Series

Defining a function using the `return` call.