owned this note
owned this note
Published
Linked with GitHub
BU Software Carpentry Workshop
==============================
## Day 1, Morning: Bash
Setup: Please download the shell-novice-data.zip by following the instructions on this page: http://swcarpentry.github.io/shell-novice/setup/
## Day 2, Morning: Git
More information on line endings:
https://help.github.com/articles/dealing-with-line-endings/
## Days 1 & 2, Afternoon: Python
#### Setup
Setup: Please download the gapminder data set here: https://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip
....and unzip it to your Desktop.
##### Exercise 1
Copy this table to your notebook, and fill in the values of the variables **after** each statement is executed.
```
# Command # Value of x # Value of y # Value of swap #
x = 1.0 # # # #
y = 3.0 # # # #
swap = x # # # #
x = y # # # #
y = swap # # # #
```
##### Exercise 2
Which is a better variable name, m, min, or minutes? Why? Hint: think about which code you would rather inherit from someone who is leaving the lab:
```
ts = m * 60 + s
tot_sec = min * 60 + sec
total_seconds = minutes * 60 + seconds
```
##### Exercise 3
Sometimes arithmetic operators are defined on non-numbers. This can make code confusing to read, so use it with care! Try the following commands:
```
first_name = "Ahmed"
last_name = "Walsh"
print(first_name + last_name)
print("abc" * 5)
```
What do the + and * operators do to strings?
##### Exercise 4
Read the following code but *do not run it.* What
is the value of `radiance`?
```
radiance = 1.0
radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))
```
##### Exercise 5
Predict what the following code will do:
```
easy_string = "abc"
print(max(easy_string))
rich = "gold"
poor = "tin"
print(max(rich, poor))
```
...and then try it!
##### Exercise 6
What function from the `math` module can you use to calculate a square root without using `sqrt`?
Since the library contains this function, why does `sqrt` exist?
##### Exercise 7
Fill in the blanks so that the program below prints `90.0`.
```
import math as m
angle = ____.degrees(____.pi / 2)
print(____)
```
##### Exercise 8
You want to select a random character from a string:
`bases = 'ACTTGCTTGAC'`
Which [standard library](https://docs.python.org/3/library/index.html) module could help you?
Which function would you select from that module? Are there alternatives?
Try to write a program that uses the function.
##### Exercise 8
Run the code, and read the error message. What type of error is it? Why didn't the code work?
```
from math import log
log(0)
```
##### Exercise 9
Read the data in `gapminder_gdp_americas.csv` (which should be in the same directory as `gapminder_gdp_oceania.csv`) into a variable called `americas` and display its summary statistics.
##### Exercise 10
After reading the data for the Americas, use `help(americas.head)` and `help(americas.tail)` to find out what the `head` and `tail` functions do.
What method call will display the first three rows of this data?
##### Exercise 11 (Challenge)
Use `help` (or Google) to figure out how to use `pandas` to write out a CSV file. Write out the `americas` dataframe to a file named `backup.csv`.
##### Exercise 12
Assume Pandas has been imported into your notebook and the Gapminder GDP data for Europe has been loaded:
```
import pandas
df = pandas.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
```
Write an expression to select each of the following:
- The Per Capita GDP of Serbia in 2007.
- GDP per capita for all countries in 1982.
- GDP per capita for Denmark for all years.
- GDP per capita for all countries for years after 1985.
- GDP per capita for each country in 2007 as a multiple of GDP per capita for that country in 1952.
##### Exercise 13
Explain what each line in the following short program does: what is in first, second, etc.?
```
first = pandas.read_csv('data/gapminder_all.csv', index_col='country')
second = first[first['continent'] == 'Americas']
third = second.drop('Puerto Rico')
fourth = third.drop('continent', axis = 1)
fourth.to_csv('result.csv')
```
##### Exercise 14
Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. Modify it again to plot the maximum GDP per capita over time for Europe.
```
data_europe = pandas.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.____.plot(label='min')
```
##### Exercise 15
This short programs creates a plot showing the correlation between GDP and life expectancy for 2007, normalizing marker size by population:
```
data_all = pandas.read_csv('gapminder_all.csv', index_col='country')
data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
s=data_all['pop_2007']/1e6)
```
Using online help and other resources, explain what each argument to plot does.
### Lists
##### Exercise 16
Given the following code:
```
new_list = ['a', 'b', 'c']
new_list.append(['d', 'e'])
```
what are the contents of `new_list`? How do you access the list element `e`?
##### Exercise 17
What does the following program print?
```
element = 'helium'
print(element[-1])
```
- How does Python interpret a negative index?
- If values is a list, what does `del values[-1]` do?
- How can you display all elements but the last one without changing values? (Hint: you will need to combine slicing and negative indexing.)
##### Exercise 18
Recall yesterday that
```
max('abc')
```
returned `c`. Why is that?
##### Exercise 19
What do these two programs print?
```
# Program A
old = list('gold')
new = old # simple assignment
new[0] = 'D'
print('new is', new, 'and old is', old)
```
```
# Program B
old = list('gold')
new = old[:] # assigning a slice
new[0] = 'D'
print('new is', new, 'and old is', old)
```
What is the difference between `new = old` and `new = old[:]`?
### For loops
##### Exercise 20
For each code block below, a desired result is indicated in the first comment. Fill in the blanks so that the code block gives the desired result.
```
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
____ = ____ + len(word)
print(total)
```
```
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
lengths.____(____)
print(lengths)
```
```
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
____
print(result)
```
```
# Create acronym: ["red", "green", "blue"] => "RGB"
______ # write the whole thing!
```
### Conditionals
##### Exercise 21
What does this program print?
```
pressure = 71.9
if pressure > 50.0:
pressure = 25.0
elif pressure <= 50.0:
pressure = 0.0
print(pressure)
```
##### Exercise 22
Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the origina list’s values were positive.
```
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
if ____:
result.append(0)
else:
____
print(result)
```
### Functions
##### Exercise 23
- Read the code below and try to identify what the errors are without running it.
- Run the code and read the error message. Is it a SyntaxError or an IndentationError?
- Fix the error.
- Repeat steps 2 and 3 until you have fixed all the errors.
```
def another_function
print("Syntax errors are annoying.")
print("But at least python tells us about them!")
print("So they are usually not too hard to fix.")
```
##### Exercise 24
Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty?
```
def first_negative(values):
for v in ____:
if ____:
return ____
```
##### Exercise 25
What does this short program print?
```
def print_date(year, month, day):
joined = str(year) + '/' + str(month) + '/' + str(day)
print(joined)
print_date(day=1, month=2, year=2003)
```
When and why is it useful to call functions this way?
### Looping over Data Sets
##### Exercise 26
Modify this program so that it prints the number of records in the file that has the fewest records.
```
import pandas
fewest = ____
for filename in glob.glob('data/*.csv'):
dataframe = pandas.____(filename)
fewest = min(____, dataframe.shape[0])
print('smallest file has', fewest, 'records')
```
##### Exercise 27
Execute the following code
```
import pandas
```
Fill in the blanks so that the following function takes a filename from the Gapminder data set a returns all the countries in it.
```
def get_countries(filename):
data = pandas.read_csv(_____,
index_col = 'country')
return _____
```
Now, write code that uses that function to make a list of all the countries in the world.
##### Exercise 28
Write a program that reads in the regional data sets and plots the average GDP per capita for each region over time. Use functions where it makes sense to do so.