<style>body { background-color: #eeeeee!important; } </style>
# NWO-I Software Carpentry 2022
:::info
:information_source: On this page you will find notes for the NWO-I Software Carpentry workshop organized on November 28 and February 6.
:::
## Code of Conduct
Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed.
## :timer_clock: Schedule February 6
| | **Programming with Python**|
|------|------|
| 09:30 | Programming with Python |
| 10:30 | *Morning break* |
| 10:45 | Programming with Python (Continued) |
| 12:30 | *Lunch break* |
| 13:15 | Programming with Python (Continued) |
| 15:30 | *Afternoon break* |
| 15:45 | Programming with Python (Continued) |
| 17:30 | *END* |
## Programming with Python
### :link: Links
* Setup page: https://swcarpentry.github.io/python-novice-inflammation/setup.html
* Lesson material: https://swcarpentry.github.io/python-novice-inflammation/
* Reference page: https://swcarpentry.github.io/python-novice-inflammation/reference.html
* Post workshop survey: https://carpentries.typeform.com/to/UgVdRQ?slug=2022-11-28-software-carpentry
### 1. Python Fundamentals
```python!
3 + 5 * 4
weight_kg = 60
print(weight_kg)
weight_lb = 2.2 * weight_kg
print(weight_lb)
patient_id = '001'
print(patient_id)
weight_kg = 60.3
print(weight_kg)
print(weight_lb)
weight_lb = 2.2 * weight_kg
print(weight_lb)
print(patient_id, 'weight in kilogram', weight_kg)
print(type(60.3))
print(type(patient_id))
weight_kg = 65.0
print('weight in kilograms is now: ', weight_kg)
weight_lb = 2.2 * weight_kg
print('weight in kilograms: ', weight_kg, 'and in pounds: ', weight_lb)
weight_kg = 100.0
print('weight in kilograms: ', weight_kg, 'and in pounds: ', weight_lb)
```
:::success
:pencil: **Check Your Understanding**
What values do the variables `mass` and `age` have after each of the following statements?
Test your answer by executing the lines.
```python!
mass = 47.5
age = 122
mass = mass * 2.0
age = age - 20
```
> :::spoiler :eyes: ***Solution***
> ```
> `mass` holds a value of 47.5, `age` does not exist
> `mass` still holds a value of 47.5, `age` holds a value of 122
> `mass` now has a value of 95.0, `age`'s value is still 122
> `mass` still has a value of 95.0, `age` now holds 102
> ```
:::
:::success
:pencil: **Sorting Out References**
Python allows you to assign multiple values to multiple variables in one line by separating
the variables and values with commas. What does the following program print out?
```python!
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print(third, fourth)
```
> :::spoiler :eyes: ***Solution***
> ```
> Hopper Grace
> ```
:::
:::success
:pencil: **Seeing Data Types**
What are the data types of the following variables?
```python!
planet = 'Earth'
apples = 5
distance = 10.5
```
> :::spoiler :eyes: ***Solution***
> ```python!
> print(type(planet))
> print(type(apples))
> print(type(distance))
> ```
>
> ```
> <class 'str'>
> <class 'int'>
> <class 'float'>
> ```
:::
### 2. Analyzing Patient Data
```python!
import numpy
numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
print(data)
print(type(data))
print(data.dtype)
print(data.shape)
print('first value in data:',data[0,0])
print('middle value in data:',data[30,20])
print(data[0:4,0:10])
print(data[5:10,36:])
print(data[:3,36:])
print(numpy.mean(data))
import time
print(time.ctime())
print('maximum inflammation: ', maxval)
print('minimum inflammation: ', minval)
print('standard deviation: ', stdval)
numpy.std?
help(numpy.std)
print(data.shape)
patient_0 = data [0,:] # 0 on the first axis (rows), everything on the second (columns)
print("maximum inflammation for patient 0: ", numpy.max(patient_0))
print("maximum inflammation for patient 2: ", numpy.max(data [2,:]))
print(numpy.mean(data, axis=0))
print(numpy.mean(data, axis=0).shape)
print(numpy.mean(data, axis=1))
```
:::success
:pencil: **Slicing Strings**
A section of an array is called a *slice*.
We can take slices of character strings as well:
```python!
element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])
```
```
first three characters: oxy
last three characters: gen
```
What is the value of `element[:4]`?
What about `element[4:]`?
Or `element[:]`?
> :::spoiler :eyes: ***Solution***
> ```
> oxyg
> en
> oxygen
> ```
What is `element[-1]`?
What is `element[-2]`?
> :::spoiler :eyes: ***Solution***
> ```
> n
> e
> ```
Given those answers,
explain what `element[1:-1]` does.
> :::spoiler :eyes: ***Solution***
> Creates a substring from index 1 up to (not including) the final index,
> effectively removing the first and last letters from 'oxygen'
How can we rewrite the slice for getting the last three characters of `element`,
so that it works even if we assign a different string to `element`?
Test your solution with the following strings: `carpentry`, `clone`, `hi`.
> :::spoiler :eyes: ***Solution***
> ```python!
> element = 'oxygen'
> print('last three characters:', element[-3:])
> element = 'carpentry'
> print('last three characters:', element[-3:])
> element = 'clone'
> print('last three characters:', element[-3:])
> element = 'hi'
> print('last three characters:', element[-3:])
> ```
> ```
> last three characters: gen
> last three characters: try
> last three characters: one
> last three characters: hi
> ```
:::
<!---
:::success
:pencil: **Thin Slices**
The expression `element[3:3]` produces an *empty string*,
i.e., a string that contains no characters.
If `data` holds our array of patient data,
what does `data[3:3, 4:4]` produce?
What about `data[3:3, :]`?
> :::spoiler :eyes: ***Solution***
> ```
> array([], shape=(0, 0), dtype=float64)
> array([], shape=(0, 40), dtype=float64)
> ```
:::
--->
:::success
:pencil: **Stacking Arrays**
Arrays can be concatenated and stacked on top of one another,
using NumPy's `vstack` and `hstack` functions for vertical and horizontal stacking, respectively.
```python!
import numpy
A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)
B = numpy.hstack([A, A])
print('B = ')
print(B)
C = numpy.vstack([A, A])
print('C = ')
print(C)
```
```
A =
[[1 2 3]
[4 5 6]
[7 8 9]]
B =
[[1 2 3 1 2 3]
[4 5 6 4 5 6]
[7 8 9 7 8 9]]
C =
[[1 2 3]
[4 5 6]
[7 8 9]
[1 2 3]
[4 5 6]
[7 8 9]]
```
Write some additional code that slices the first and last columns of `A`,
and stacks them into a 3x2 array.
Make sure to `print` the results to verify your solution.
> :::spoiler :eyes: ***Solution***
>
> A 'gotcha' with array indexing is that singleton dimensions
> are dropped by default. That means `A[:, 0]` is a one dimensional
> array, which won't stack as desired. To preserve singleton dimensions,
> the index itself can be a slice or array. For example, `A[:, :1]` returns
> a two dimensional array with one singleton dimension (i.e. a column
> vector).
>
> ```python!
> D = numpy.hstack((A[:, :1], A[:, -1:]))
> print('D = ')
> print(D)
> ```
>
> ```
> D =
> [[1 3]
> [4 6]
> [7 9]]
> ```
> :::spoiler :eyes: ***Solution***
>
> An alternative way to achieve the same result is to use Numpy's
> delete function to remove the second column of A.
>
> ```python!
> D = numpy.delete(A, 1, 1)
> print('D = ')
> print(D)
> ```
>
> ```
> D =
> [[1 3]
> [4 6]
> [7 9]]
> ```
:::
:::success
:pencil: **Change In Inflammation**
The patient data is _longitudinal_ in the sense that each row represents a
series of observations relating to one individual. This means that
the change in inflammation over time is a meaningful concept.
Let's find out how to calculate changes in the data contained in an array
with NumPy.
The `numpy.diff()` function takes an array and returns the differences
between two successive values. Let's use it to examine the changes
each day across the first week of patient 3 from our inflammation dataset.
```python!
patient3_week1 = data[3, :7]
print(patient3_week1)
```
```
[0. 0. 2. 0. 4. 2. 2.]
```
Calling `numpy.diff(patient3_week1)` would do the following calculations
```python!
[ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]
```
and return the 6 difference values in a new array.
```python!
numpy.diff(patient3_week1)
```
```
array([ 0., 2., -2., 4., -2., 0.])
```
Note that the array of differences is shorter by one element (length 6).
When calling `numpy.diff` with a multi-dimensional array, an `axis` argument may
be passed to the function to specify which axis to process. When applying
`numpy.diff` to our 2D inflammation array `data`, which axis would we specify?
> :::spoiler :eyes: ***Solution***
> Since the row axis (0) is patients, it does not make sense to get the
> difference between two arbitrary patients. The column axis (1) is in
> days, so the difference is the change in inflammation -- a meaningful
> concept.
>
> ```python!
> numpy.diff(data, axis=1)
> ```
If the shape of an individual data file is `(60, 40)` (60 rows and 40
columns), what would the shape of the array be after you run the `diff()`
function and why?
> :::spoiler :eyes: ***Solution***
> The shape will be `(60, 39)` because there is one fewer difference between
> columns than there are columns in the data.
How would you find the largest change in inflammation for each patient? Does
it matter if the change in inflammation is an increase or a decrease?
> :::spoiler :eyes: ***Solution***
> By using the `numpy.max()` function after you apply the `numpy.diff()`
> function, you will get the largest difference between days.
>
> ```python!
> numpy.max(numpy.diff(data, axis=1), axis=1)
> ```
>
> ```python!
> array([ 7., 12., 11., 10., 11., 13., 10., 8., 10., 10., 7.,
> 7., 13., 7., 10., 10., 8., 10., 9., 10., 13., 7.,
> 12., 9., 12., 11., 10., 10., 7., 10., 11., 10., 8.,
> 11., 12., 10., 9., 10., 13., 10., 7., 7., 10., 13.,
> 12., 8., 8., 10., 10., 9., 8., 13., 10., 7., 10.,
> 8., 12., 10., 7., 12.])
> ```
>
> If inflammation values *decrease* along an axis, then the difference from
> one element to the next will be negative. If
> you are interested in the **magnitude** of the change and not the
> direction, the `numpy.absolute()` function will provide that.
>
> Notice the difference if you get the largest _absolute_ difference
> between readings.
>
> ```python!
> numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1)
> ```
>
> ```python!
> array([ 12., 14., 11., 13., 11., 13., 10., 12., 10., 10., 10.,
> 12., 13., 10., 11., 10., 12., 13., 9., 10., 13., 9.,
> 12., 9., 12., 11., 10., 13., 9., 13., 11., 11., 8.,
> 11., 12., 13., 9., 10., 13., 11., 11., 13., 11., 13.,
> 13., 10., 9., 10., 10., 9., 9., 13., 10., 9., 10.,
> 11., 13., 10., 10., 12.])
> ```
>
:::
### 3. Visualizing Tabular Data
```python!
import numpy
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
import matplotlib.pyplot
image = matplotlib.pyplot.imshow(data)
matplotlib.pyplot.show()
ave_imflammation = numpy.mean(data,axis=0)
ave_plot = matplotlib.pyplot.plot(ave_imflammation)
matplotlib.pyplot.show()
max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
matplotlib.pyplot.show()
min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
matplotlib.pyplot.show()
import numpy
import matplotlib.pyplot
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))
fig.tight_layout()
matplotlib.pyplot.savefig('inflammation.png')
matplotlib.pyplot.show()
import numpy as np
np.max(data)
import matplotlib.pyplot as plt
plt.imshow(data)
plt.show()
```
:::success
:pencil: **Plot Scaling**
Why do all of our plots stop just short of the upper end of our graph?
> :::spoiler :eyes: ***Solution***
> Because matplotlib normally sets x and y axes limits to the min and max of our data
> (depending on data range)
>
> If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes',
> for example:
>
> ```python!
> axes3.set_ylim(0,6)
> ```
Update your plotting code to automatically set a more appropriate scale.
(Hint: you can make use of the `max` and `min` methods to help.)
> :::spoiler :eyes: ***Solution***
> ```python!
> # One method
> axes3.set_ylabel('min')
> axes3.plot(numpy.min(data, axis=0))
> axes3.set_ylim(0,6)
> ```
> :::spoiler :eyes: ***Solution***
> ```python!
> # A more automated approach
> min_data = numpy.min(data, axis=0)
> axes3.set_ylabel('min')
> axes3.plot(min_data)
> axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)
> ```
:::
:::success
:pencil: **Drawing Straight Lines**
In the center and right subplots above, we expect all lines to look like step functions because
non-integer value are not realistic for the minimum and maximum values. However, you can see
that the lines are not always vertical or horizontal, and in particular the step function
in the subplot on the right looks slanted. Why is this?
> :::spoiler :eyes: ***Solution***
> Because matplotlib interpolates (draws a straight line) between the points.
> One way to do avoid this is to use the Matplotlib `drawstyle` option:
>
> ```python!
> import numpy
> import matplotlib.pyplot
>
> data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
>
> fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
>
> axes1 = fig.add_subplot(1, 3, 1)
> axes2 = fig.add_subplot(1, 3, 2)
> axes3 = fig.add_subplot(1, 3, 3)
>
> axes1.set_ylabel('average')
> axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid')
>
> axes2.set_ylabel('max')
> axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid')
>
> axes3.set_ylabel('min')
> axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid')
>
> fig.tight_layout()
>
> matplotlib.pyplot.show()
> ```
> 
:::
:::success
:pencil: **Make Your Own Plot**
Create a plot showing the standard deviation (`numpy.std`)
of the inflammation data for each day across all patients.
> :::spoiler :eyes: ***Solution***
> ```python!
> std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0))
> matplotlib.pyplot.show()
> ```
:::
:::success
:pencil: **Moving Plots Around**
Modify the program to display the three plots on top of one another
instead of side by side.
> :::spoiler :eyes: ***Solution***
> ```python!
> import numpy
> import matplotlib.pyplot
>
> data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
>
> # change figsize (swap width and height)
> fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0))
>
> # change add_subplot (swap first two parameters)
> axes1 = fig.add_subplot(3, 1, 1)
> axes2 = fig.add_subplot(3, 1, 2)
> axes3 = fig.add_subplot(3, 1, 3)
>
> axes1.set_ylabel('average')
> axes1.plot(numpy.mean(data, axis=0))
>
> axes2.set_ylabel('max')
> axes2.plot(numpy.max(data, axis=0))
>
> axes3.set_ylabel('min')
> axes3.plot(numpy.min(data, axis=0))
>
> fig.tight_layout()
>
> matplotlib.pyplot.show()
> ```
:::
### 4. Storing Multiple Values in Lists
```python!
odds = [1, 3, 5, 7]
print('odds are:', odds)
print('first element:', odds[0])
print('last element:', odds[3])
print('"-1" element:', odds[-1])
names = ['Curie','Darwing','Turing'] # Typo in Darwin's name
print('names is originally: ', names)
names[1] = 'Darwin' # Correct the name
print('final value of names', names)
name = 'Darwin'
name[0] = 'd'
salsa = ['peppers', 'unions', 'cilantro', 'tomatoes']
my_salsa = salsa
salsa[0] = 'hot peppers'
print('Ingredients in my salsa', my_salsa)
salsa = ['peppers', 'unions', 'cilantro', 'tomatoes']
my_salsa = list(salsa)
salsa[0] = 'hot peppers'
print('Ingredients in my salsa', my_salsa)
x = [['pepper', 'zucchini', 'union'], ['cabbage','lettuce','garlic'],['apple','pear','banana']]
print([x[0]])
print(x[0])
print(x[0][0])
sample_age = [10, 12.5, 'unknown']
print(sample_age)
odds.append(11)
print('odds after append: ', odds)
removed_element = odds.pop(0)
print('odds after removing first element: ', odds)
print('removed element: ', removed_element)
odds.reverse()
print('odds after reverse', odds)
odds = [3,5,7]
primes = odds
primes.append(2)
print('primes: ', primes)
print('odds: ', odds)
primes.insert(2, 13)
print(primes)
binomial_name = 'Drosophila melanogaster'
group = binomial_name[0:10]
print('group:', group)
species = binomial_name[11:23]
print('species:', species)
chromosomes = ['X', 'Y', '2', '3', '4']
autosomes = chromosomes[2:5]
print('autosomes:', autosomes)
last = chromosomes[-1]
print('last:', last)
```
:::success
:pencil: **Slicing From the End**
Use slicing to access only the last four characters of a string or entries of a list.
```python!
string_for_slicing = 'Observation date: 02-Feb-2013'
list_for_slicing = [['fluorine', 'F'],
['chlorine', 'Cl'],
['bromine', 'Br'],
['iodine', 'I'],
['astatine', 'At']]
```
```
'2013'
[['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']]
```
Would your solution work regardless of whether you knew beforehand
the length of the string or list
(e.g. if you wanted to apply the solution to a set of lists of different lengths)?
If not, try to change your approach to make it more robust.
Hint: Remember that indices can be negative as well as positive
> :::spoiler :eyes: ***Solution***
> Use negative indices to count elements from the end of a container (such as list or string):
>
> ```python!
> string_for_slicing[-4:]
> list_for_slicing[-4:]
> ```
:::
:::success
:pencil: **Overloading**
`+` usually means addition, but when used on strings or lists, it means "concatenate".
Given that, what do you think the multiplication operator `*` does on lists?
In particular, what will be the output of the following code?
```python!
counts = [2, 4, 6, 8, 10]
repeats = counts * 2
print(repeats)
```
1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]`
2. `[4, 8, 12, 16, 20]`
3. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]`
4. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]`
The technical term for this is *operator overloading*:
a single operator, like `+` or `*`,
can do different things depending on what it's applied to.
> :::spoiler :eyes: ***Solution***
>
> The multiplication operator `*` used on a list replicates elements of the list and concatenates
> them together:
>
> ```
> [2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
> ```
>
> It's equivalent to:
>
> ```python!
> counts + counts
> ```
:::
### Repeating Actions with Loops
```python!
odds = [1, 3, 5, 7]
print(odds[0])
print(odds[1])
print(odds[2])
print(odds[3])
odds = [1, 3, 5]
print(odds[0])
print(odds[1])
print(odds[2])
print(odds[3])
# Error!
odds = [1, 3, 5, 7]
for num in odds:
print(num)
odds = [1, 3, 5, 7]
for banana in odds:
print(banana)
length = 0
names = ['Curie', 'Darwin', 'Turing']
for value in names:
length = length + 1
print('There are', length, 'names in the list.')
name = 'Rosalind'
for name in ['Curie', 'Darwin', 'Turing']:
print(name)
print('after the loop, name is', name)
print(len([1, 3, 5, 7]))
```
:::success
:pencil: **From 1 to N**
Python has a built-in function called `range` that generates a sequence of numbers. `range` can
accept 1, 2, or 3 parameters.
* If one parameter is given, `range` generates a sequence of that length,
starting at zero and incrementing by 1.
For example, `range(3)` produces the numbers `0, 1, 2`.
* If two parameters are given, `range` starts at
the first and ends just before the second, incrementing by one.
For example, `range(2, 5)` produces `2, 3, 4`.
* If `range` is given 3 parameters,
it starts at the first one, ends just before the second one, and increments by the third one.
For example, `range(3, 10, 2)` produces `3, 5, 7, 9`.
Using `range`,
write a loop that uses `range` to print the first 3 natural numbers:
```python!
1
2
3
```
> :::spoiler :eyes: ***Solution***
> ```python!
> for number in range(1, 4):
> print(number)
> ```
:::
:::success
:pencil: **Understanding the loops**
Given the following loop:
```python!
word = 'oxygen'
for char in word:
print(char)
```
How many times is the body of the loop executed?
* 3 times
* 4 times
* 5 times
* 6 times
> :::spoiler :eyes: ***Solution***
>
> The body of the loop is executed 6 times.
>
:::
### 6. Analyzing Data from Multiple Files
```python!
import glob
print(glob.glob('inflammation*.csv'))
filenames = glob.glob('inflammation*.csv')
print (filenames)
filenames = sorted(glob.glob('inflammation*.csv'))
print (filenames)
import glob
import numpy
import matplotlib.pyplot
filenames = sorted(glob.glob('inflammation*.csv'))
filenames = filenames[0:3]
for filename in filenames:
print(filename)
data = numpy.loadtxt(fname=filename, delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
```
:::success
:pencil: **Plotting Differences**
Plot the difference between the average inflammations reported in the first and second datasets
(stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly),
i.e., the difference between the leftmost plots of the first two figures.
> :::spoiler :eyes: ***Solution***
> ```python!
> import glob
> import numpy
> import matplotlib.pyplot
>
> filenames = sorted(glob.glob('inflammation*.csv'))
>
> data0 = numpy.loadtxt(fname=filenames[0], delimiter=',')
> data1 = numpy.loadtxt(fname=filenames[1], delimiter=',')
>
> fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
>
> matplotlib.pyplot.ylabel('Difference in average')
> matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0))
>
> fig.tight_layout()
> matplotlib.pyplot.show()
> ```
:::
:::success
:pencil: **Generate Composite Statistics**
Use each of the files once to generate a dataset containing values averaged over all patients:
```python!
filenames = glob.glob('inflammation*.csv')
composite_data = numpy.zeros((60,40))
for filename in filenames:
# sum each new file's data into composite_data as it's read
#
# and then divide the composite_data by number of samples
composite_data = composite_data / len(filenames)
```
Then use pyplot to generate average, max, and min for all patients.
> :::spoiler :eyes: ***Solution***
> ```python!
> import glob
> import numpy
> import matplotlib.pyplot
>
> filenames = glob.glob('inflammation*.csv')
> composite_data = numpy.zeros((60,40))
>
> for filename in filenames:
> data = numpy.loadtxt(fname = filename, delimiter=',')
> composite_data = composite_data + data
>
> composite_data = composite_data / len(filenames)
>
> fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
>
> axes1 = fig.add_subplot(1, 3, 1)
> axes2 = fig.add_subplot(1, 3, 2)
> axes3 = fig.add_subplot(1, 3, 3)
>
> axes1.set_ylabel('average')
> axes1.plot(numpy.mean(composite_data, axis=0))
>
> axes2.set_ylabel('max')
> axes2.plot(numpy.max(composite_data, axis=0))
>
> axes3.set_ylabel('min')
> axes3.plot(numpy.min(composite_data, axis=0))
>
> fig.tight_layout()
>
> matplotlib.pyplot.show()
> ```
:::
### 7. Making Choices
```python!
num = 17
if num > 100:
print('greater than')
else:
print('not greater than')
print('done')
num = 53
print('before conditional')
if num > 100:
print(num, 'is greater than 100')
print('after conditional')
num = -3
if num > 0:
print(num, 'is positive')
elif num == 0:
print(num, 'is zero')
else:
print(num, 'is negative')
if (1 > 0) and (-1 >= 0):
print('both parts are true')
else:
print('and least one part is false')
if (1 < 0) or (1 >= 0):
print('at least one test is true')
import numpy
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
max_inflammation_0 = numpy.max(data, axis=0)[0]
max_inflammation_20 = numpy.max(data, axis=0)[20]
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print('Suspicious looking maxima!')
elif numpy.sum(numpy.min(data, axis=0)) == 0:
print('Minima add up to zero!')
else:
print('Seems OK!')
```
:::success
:pencil: **How Many Paths?**
Consider this code:
```python!
if 4 > 5:
print('A')
elif 4 == 5:
print('B')
elif 4 < 5:
print('C')
```
Which of the following would be printed if you were to run this code?
Why did you pick this answer?
1. A
2. B
3. C
4. B and C
> :::spoiler :eyes: ***Solution***
> C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true,
> but `4 < 5` is true.
:::
:::success
:pencil: **What Is Truth?**
`True` and `False` booleans are not the only values in Python that are true and false.
In fact, *any* value can be used in an `if` or `elif`.
After reading and running the code below,
explain what the rule is for which values are considered true and which are considered false.
```python!
if '':
print('empty string is true')
if 'word':
print('word is true')
if []:
print('empty list is true')
if [1, 2, 3]:
print('non-empty list is true')
if 0:
print('zero is true')
if 1:
print('one is true')
```
:::
<!---
:::success
:pencil: **That's Not Not What I Meant**
Sometimes it is useful to check whether some condition is not true.
The Boolean operator `not` can do this explicitly.
After reading and running the code below,
write some `if` statements that use `not` to test the rule
that you formulated in the previous challenge.
```python!
if not '':
print('empty string is not true')
if not 'word':
print('word is not true')
if not not True:
print('not not True is true')
```
:::
<!---
:::success
:pencil: **Close Enough**
Write some conditions that print `True` if the variable `a` is within 10% of the variable `b`
and `False` otherwise.
Compare your implementation with your partner's:
do you get the same answer for all possible pairs of numbers?
> :::spoiler :eyes: ***Solution***
> There is a [built-in function `abs`][abs-function] that returns the absolute value of
> a number:
> ```python!
> print(abs(-12))
> ```
> ```
> 12
> ```
> :::spoiler :eyes: ***Solution***
> ```python!
> a = 5
> b = 5.1
>
> if abs(a - b) <= 0.1 * abs(b):
> print('True')
> else:
> print('False')
> ```
> :::spoiler :eyes: ***Solution***
> ```python!
> print(abs(a - b) <= 0.1 * abs(b))
> ```
>
> This works because the Booleans `True` and `False`
> have string representations which can be printed.
:::
<!---
:::success
:pencil: **In-Place Operators**
Python (and most other languages in the C family) provides
[in-place operators]({{ page.root }}/reference.html#in-place-operators)
that work like this:
```python!
x = 1 # original value
x += 1 # add one to x, assigning result back to x
x *= 3 # multiply x by 3
print(x)
```
```
6
```
Write some code that sums the positive and negative numbers in a list separately,
using in-place operators.
Do you think the result is more or less readable
than writing the same without in-place operators?
> :::spoiler :eyes: ***Solution***
> ```python!
> positive_sum = 0
> negative_sum = 0
> test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
> for num in test_list:
> if num > 0:
> positive_sum += num
> elif num == 0:
> pass
> else:
> negative_sum += num
> print(positive_sum, negative_sum)
> ```
> Here `pass` means "don't do anything".
> :::spoiler :eyes: ***Solution***
:::
<!---
:::success
:pencil: ***Optional:* Sorting a List Into Buckets**
In our `data` folder, large data sets are stored in files whose names start with
"inflammation-" and small data sets -- in files whose names start with "small-". We
also have some other files that we do not care about at this point. We'd like to break all
these files into three lists called `large_files`, `small_files`, and `other_files`,
respectively.
Add code to the template below to do this. Note that the string method
[`startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith)
returns `True` if and only if the string it is called on starts with the string
passed as an argument, that is:
```python!
'String'.startswith('Str')
```
```
True
```
But
```python!
'String'.startswith('str')
```
```
False
```
Use the following Python code as your starting point:
```python!
filenames = ['inflammation-01.csv',
'myscript.py',
'inflammation-02.csv',
'small-01.csv',
'small-02.csv']
large_files = []
small_files = []
other_files = []
```
Your solution should:
1. loop over the names of the files
2. figure out which group each filename belongs in
3. append the filename to that list
In the end the three lists should be:
```python!
large_files = ['inflammation-01.csv', 'inflammation-02.csv']
small_files = ['small-01.csv', 'small-02.csv']
other_files = ['myscript.py']
```
> :::spoiler :eyes: ***Solution***
> ```python!
> for filename in filenames:
> if filename.startswith('inflammation-'):
> large_files.append(filename)
> elif filename.startswith('small-'):
> small_files.append(filename)
> else:
> other_files.append(filename)
>
> print('large_files:', large_files)
> print('small_files:', small_files)
> print('other_files:', other_files)
> ```
:::
--->
### 8. Creating Functions
```python!
def fahr_to_celsius(temp):
return((temp - 32) * (5/9))
print(fahr_to_celsius(32))
print('freezing point of water:' , fahr_to_celsius(32), 'C')
print('boiling point of water:' , fahr_to_celsius(212), 'C')
def celsius_to_kelvin(temp_c):
return temp_c + 273.15
print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
def fahr_to_kelvin(temp_f):
temp_c = fahr_to_celsius(temp_f)
temp_k = celsius_to_kelvin(temp_c)
return temp_k
print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
temp_kelvin = fahr_to_kelvin(212.0)
print(temp_kelvin)
def print_temperatures():
print('temperature in Fahrenheir was:', temp_fahr)
print('temperature in Kelvin was:', temp_kelvin)
temp_fahr = 212.0
temp_kelvin = fahr_to_kelvin(temp_fahr)
print_temperatures()
def visualize(filename):
data = numpy.loadtxt(fname=filename, delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
def detect_problems(filename):
data = numpy.loadtxt(fname=filename, delimiter=',')
max_inflammation_0 = numpy.max(data, axis=0)[0]
max_inflammation_20 = numpy.max(data, axis=0)[20]
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print('Suspicious looking maxima!')
elif numpy.sum(numpy.min(data, axis=0)) == 0:
print('Minima add up to zero!')
else:
print('Seems OK!')
import numpy
import glob
import matplotlib.pyplot
filenames = sorted(glob.glob('inflammation*.csv'))
for filename in filenames[:3]:
print(filename)
visualize(filename)
detect_problems(filename)
def offset_mean(data, target_mean_value):
return (data - numpy.mean(data)) + target_mean_value
z = numpy.zeros((2,2))
print(offset_mean(z,3))
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
print(offset_mean(data, 0))
print('original min, mean, and max are:', numpy.min(data), numpy.mean(data), numpy.max(data))
offset_data = offset_mean(data, 0)
print('min, mean, and max of offset data are:',
numpy.min(offset_data),
numpy.mean(offset_data),
numpy.max(offset_data))
print('standard deviation before and after', numpy.std(data), numpy.std(offset_data))
print('difference standard deviation before and after', numpy.std(data) - numpy.std(offset_data))
# offset_mean(data, target_mean_value):
# return a new array containing the original data with its mean offset to match the desired value
def offset_mean(data, target_mean_value):
return (data - numpy.mean(data)) + target_mean_value
def offset_mean(data, target_mean_value):
"""Return a new array containing the original data
with its mean offset to match the desired value."""
return (data - numpy.mean(data)) + target_mean_value
help(offset_mean)
def offset_mean(data, target_mean_value):
"""Return a new array containing the original data
with its mean offset to match the desired value.
Examples
--------
>>> offset_mean([1,2,3],0)
array([-1., 0., 1.])
"""
return (data - numpy.mean(data)) + target_mean_value
help(offset_mean)
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
data = numpy.loadtxt('inflammation-01.csv', delimiter=',')
data = numpy.loadtxt('inflammation-01.csv', ',')
help(numpy.loadtxt)
def offset_mean(data, target_mean_value=0.0):
"""Return a new array containing the original data
with its mean offset to match the desired value.
Examples
--------
>>> offset_mean([1,2,3],0)
array([-1., 0., 1.])
"""
return (data - numpy.mean(data)) + target_mean_value
test_data = numpy.zeros((2,2))
print(offset_mean(test_data, 3))
more_data = 5 + numpy.zeros((2,2))
print('data before mean offset:')
print(more_data)
print('offset data:')
print(offset_mean(more_data))
def display(a=1, b=2, c=3):
print('a:', a, 'b:', b, 'c:', c)
print('no parameters:')
display()
print('one parameter:')
display(55)
print('two parameters:')
display(55, 66)
print('only setting the value of c')
display(c=77)
help(numpy.loadtxt)
```
:::success
:pencil: **Combining Strings**
"Adding" two strings produces their concatenation:
`'a' + 'b'` is `'ab'`.
Write a function called `fence` that takes two parameters called `original` and `wrapper`
and returns a new string that has the wrapper character at the beginning and end of the original.
A call to your function should look like this:
```python!
print(fence('name', '*'))
```
```
*name*
```
> :::spoiler :eyes: ***Solution***
> ```python!
> def fence(original, wrapper):
> return wrapper + original + wrapper
> ```
:::
:::success
:pencil: **Return versus print**
Note that `return` and `print` are not interchangeable.
`print` is a Python function that *prints* data to the screen.
It enables us, *users*, see the data.
`return` statement, on the other hand, makes data visible to the program.
Let's have a look at the following function:
```python!
def add(a, b):
print(a + b)
```
**Question**: What will we see if we execute the following commands?
```python!
A = add(7, 3)
print(A)
```
> :::spoiler :eyes: ***Solution***
> Python will first execute the function `add` with `a = 7` and `b = 3`,
> and, therefore, print `10`. However, because function `add` does not have a
> line that starts with `return` (no `return` "statement"), it will, by default, return
> nothing which, in Python world, is called `None`. Therefore, `A` will be assigned to `None`
> and the last line (`print(A)`) will print `None`. As a result, we will see:
> ```
> 10
> None
> ```
:::
:::success
:pencil: **Selecting Characters From Strings**
If the variable `s` refers to a string,
then `s[0]` is the string's first character
and `s[-1]` is its last.
Write a function called `outer`
that returns a string made up of just the first and last characters of its input.
A call to your function should look like this:
```python!
print(outer('helium'))
```
```
hm
```
> :::spoiler :eyes: ***Solution***
> ```python!
> def outer(input_string):
> return input_string[0] + input_string[-1]
> ```
:::
<!---
:::success
:pencil: **Rescaling an Array**
Write a function `rescale` that takes an array as input
and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0.
(Hint: If `L` and `H` are the lowest and highest values in the original array,
then the replacement for a value `v` should be `(v-L) / (H-L)`.)
> :::spoiler :eyes: ***Solution***
> ```python!
> def rescale(input_array):
> L = numpy.min(input_array)
> H = numpy.max(input_array)
> output_array = (input_array - L) / (H - L)
> return output_array
> ```
:::
<!---
:::success
:pencil: **Variables Inside and Outside Functions**
What does the following piece of code display when run --- and why?
```python!
f = 0
k = 0
def f2k(f):
k = ((f - 32) * (5.0 / 9.0)) + 273.15
return k
print(f2k(8))
print(f2k(41))
print(f2k(32))
print(k)
```
> :::spoiler :eyes: ***Solution***
>
> ```
> 259.81666666666666
> 278.15
> 273.15
> 0
> ```
> `k` is 0 because the `k` inside the function `f2k` doesn't know
> about the `k` defined outside the function. When the `f2k` function is called,
> it creates a [local variable]({{ page.root }}/reference.html#local-variable)
> `k`. The function does not return any values
> and does not alter `k` outside of its local copy.
> Therefore the original value of `k` remains unchanged.
> Beware that a local `k` is created because `f2k` internal statements
> *affect* a new value to it. If `k` was only `read`, it would simply retrieve the
> global `k` value.
:::
:::success
:pencil: **Mixing Default and Non-Default Parameters**
Given the following code:
```python!
def numbers(one, two=2, three, four=4):
n = str(one) + str(two) + str(three) + str(four)
return n
print(numbers(1, three=3))
```
what do you expect will be printed? What is actually printed?
What rule do you think Python is following?
1. `1234`
2. `one2three4`
3. `1239`
4. `SyntaxError`
Given that, what does the following piece of code display when run?
```python!
def func(a, b=3, c=6):
print('a: ', a, 'b: ', b, 'c:', c)
func(-1, 2)
```
1. `a: b: 3 c: 6`
2. `a: -1 b: 3 c: 6`
3. `a: -1 b: 2 c: 6`
4. `a: b: -1 c: 2`
> :::spoiler :eyes: ***Solution***
> Attempting to define the `numbers` function results in `4. SyntaxError`.
> The defined parameters `two` and `four` are given default values. Because
> `one` and `three` are not given default values, they are required to be
> included as arguments when the function is called and must be placed
> before any parameters that have default values in the function definition.
>
> The given call to `func` displays `a: -1 b: 2 c: 6`. -1 is assigned to
> the first parameter `a`, 2 is assigned to the next parameter `b`, and `c` is
> not passed a value, so it uses its default value 6.
:::
--->
### 9. Errors and Exceptions
```python!
# This code has and intentional error
def favorite_ice_cream():
ice_creams = ['chocolate', 'vanilla', 'strawberry']
print(ice_creams[3])
favorite_ice_cream()
def favorite_ice_cream():
ice_creams = ['chocolate', 'vanilla', 'strawberry']
print(ice_creams[2])
favorite_ice_cream()
def some_function()
msg = 'hello, world!'
print(msg)
return msg
def some_function():
msg = 'hello, world!'
print(msg)
return msg
def some_function():
msg = 'hello, world!'
print(msg)
return msg
def some_function():
msg = 'hello, world!'
print(msg)
return msg
print(a)
for number in range(10):
count = count + number
print('The count is', count)
count=0
for number in range(10):
count = count + number
print('The count is', count)
count=0
for number in range(10):
Count = Count + number
print('The count is', Count)
letters = ['a','b','c']
print('letter #1 is', letters[0])
print('letter #2 is', letters[1])
print('letter #3 is', letters[2])
print('letter #4 is', letters[3])
file_handle = open('myfile.txt', 'r')
file_handle = open('myfile.txt', 'w')
file_handle.read()
```
:::success
:pencil: **Reading Error Messages**
Read the Python code and the resulting traceback below, and answer the following questions:
1. How many levels does the traceback have?
2. What is the function name where the error occurred?
3. On which line number in this function did the error occur?
4. What is the type of error?
5. What is the error message?
```python!
# This code has an intentional error. Do not type it directly;
# use it for reference to understand the error message below.
def print_message(day):
messages = {
'monday': 'Hello, world!',
'tuesday': 'Today is Tuesday!',
'wednesday': 'It is the middle of the week.',
'thursday': 'Today is Donnerstag in German!',
'friday': 'Last day of the week!',
'saturday': 'Hooray for the weekend!',
'sunday': 'Aw, the weekend is almost over.'
}
print(messages[day])
def print_friday_message():
print_message('Friday')
print_friday_message()
```
```
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-1-4be1945adbe2> in <module>()
14 print_message('Friday')
15
---> 16 print_friday_message()
<ipython-input-1-4be1945adbe2> in print_friday_message()
12
13 def print_friday_message():
---> 14 print_message('Friday')
15
16 print_friday_message()
<ipython-input-1-4be1945adbe2> in print_message(day)
9 'sunday': 'Aw, the weekend is almost over.'
10 }
---> 11 print(messages[day])
12
13 def print_friday_message():
KeyError: 'Friday'
```
> :::spoiler :eyes: ***Solution***
> 1. 3 levels
> 2. `print_message`
> 3. 11
> 4. `KeyError`
> 5. There isn't really a message; you're supposed
> to infer that `Friday` is not a key in `messages`.
:::
<!---
:::success
:pencil: **Identifying Syntax Errors**
1. Read the code below, and (without running it) try to identify what the errors are.
2. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.
```python!
def another_function
print('Syntax errors are annoying.')
print('But at least Python tells us about them!')
print('So they are usually not too hard to fix.')
```
> :::spoiler :eyes: ***Solution***
> `SyntaxError` for missing `():` at end of first line,
> :::spoiler :eyes: ***Solution***
>
> ```python!
> def another_function():
> print('Syntax errors are annoying.')
> print('But at least Python tells us about them!')
> print('So they are usually not too hard to fix.')
> ```
:::
<!---
:::success
:pencil: **Identifying Variable Name Errors**
1. Read the code below, and (without running it) try to identify what the errors are.
2. Run the code, and read the error message.
What type of `NameError` do you think this is?
In other words, is it a string with no quotes,
a misspelled variable,
or a variable that should have been defined but was not?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.
```python!
for number in range(10):
# use a if the number is a multiple of 3, otherwise use b
if (Number % 3) == 0:
message = message + a
else:
message = message + 'b'
print(message)
```
> :::spoiler :eyes: ***Solution***
> 3 `NameError`s for `number` being misspelled, for `message` not defined,
> and for `a` not being in quotes.
>
> Fixed version:
>
> ```python!
> message = ''
> for number in range(10):
> # use a if the number is a multiple of 3, otherwise use b
> if (number % 3) == 0:
> message = message + 'a'
> else:
> message = message + 'b'
> print(message)
> ```
:::
<!---
:::success
:pencil: **Identifying Index Errors**
1. Read the code below, and (without running it) try to identify what the errors are.
2. Run the code, and read the error message. What type of error is it?
3. Fix the error.
```python!
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])
```
> :::spoiler :eyes: ***Solution***
> `IndexError`; the last entry is `seasons[3]`, so `seasons[4]` doesn't make sense.
> A fixed version is:
>
> ```python!
> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
> print('My favorite season is ', seasons[-1])
> ```
:::
--->
### 10. Defensive Programming
```python!
numbers = [1.5, 2.3, 0.7, -0.001, 4.4]
total = 0.0
for num in numbers:
assert num > 0.0, 'data should only positive values'
total = total + num
print('total is', total)
def normalize_rectangle(rect):
"""Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis.
Input should be of the format (x0, y0, x1, y1).
(x0, y0) and (x1, y1) define the lower left and upper right corners
of the rectangle, respectively."""
assert len(rect) == 4, 'Rectangles must contain 4 coordinates'
x0, y0, x1, y1 = rect
assert x0 < x1, 'Invalid X coordinates'
assert y0 < y1, 'Invalid Y coordinates'
dx = x1 - x0
dy = y1 - y0
if dx > dy:
scaled = float(dx) / dy
upper_x, upper_y = 1.0, scaled
else:
scaled = float(dx) / dy
upper_x, upper_y = scaled, 1.0
assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid'
assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid'
return (0, 0, upper_x, upper_y)
print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate
print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted
print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) ))
```
<!---
:::success
:pencil: **Pre- and Post-Conditions**
Suppose you are writing a function called `average` that calculates
the average of the numbers in a list.
What pre-conditions and post-conditions would you write for it?
Compare your answer to your neighbor's:
can you think of a function that will pass your tests but not his/hers or vice versa?
> :::spoiler :eyes: ***Solution***
> ```python!
> # a possible pre-condition:
> assert len(input_list) > 0, 'List length must be non-zero'
> # a possible post-condition:
> assert numpy.min(input_list) <= average <= numpy.max(input_list),
> 'Average should be between min and max of input values (inclusive)'
> ```
:::
<!---
:::success
:pencil: **Testing Assertions**
Given a sequence of a number of cars, the function `get_total_cars` returns
the total number of cars.
```python!
get_total_cars([1, 2, 3, 4])
```
```
10
```
```python!
get_total_cars(['a', 'b', 'c'])
```
```
ValueError: invalid literal for int() with base 10: 'a'
```
Explain in words what the assertions in this function check,
and for each one,
give an example of input that will make that assertion fail.
```python!
def get_total(values):
assert len(values) > 0
for element in values:
assert int(element)
values = [int(element) for element in values]
total = sum(values)
assert total > 0
return total
```
> :::spoiler :eyes: ***Solution***
> * The first assertion checks that the input sequence `values` is not empty.
> An empty sequence such as `[]` will make it fail.
> * The second assertion checks that each value in the list can be turned into an integer.
> Input such as `[1, 2,'c', 3]` will make it fail.
> * The third assertion checks that the total of the list is greater than 0.
> Input such as `[-10, 2, 3]` will make it fail.
:::