NWO-I Software Carpentry, Summer 2023, day 2 - HackMD

<style>body { background-color: #eeeeee!important; } </style> :::info :information_source: On this page you will find notes for the second day of the NWO-I Software Carpentry workshop organized on July 18 ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :timer_clock: Schedule July 18 | | **Programming with Python**| |------|------| | 09:30 | Programming with Python | | 10:30 | *Morning break* | | 10:45 | Programming with Python (Continued) | | 12:30 | *Lunch break* | | 13:15 | Programming with Python (Continued) | | 15:30 | *Afternoon break* | | 15:45 | Programming with Python (Continued) | | 17:30 | *END* | ## Programming with Python ### :link: Links * Setup page: https://swcarpentry.github.io/python-novice-inflammation/index.html#setup * Lesson material: https://swcarpentry.github.io/python-novice-inflammation/ * Reference page: https://swcarpentry.github.io/python-novice-inflammation/reference.html ![](https://hackmd.io/_uploads/ByzVhma92.jpg) ### 1. Python Fundamentals ![](https://hackmd.io/_uploads/B1eu27a93.jpg) ![](https://hackmd.io/_uploads/S1t_376c3.jpg) ![](https://hackmd.io/_uploads/BJgt2m65h.jpg) ![](https://hackmd.io/_uploads/SJ8FnXT9n.jpg) ![](https://hackmd.io/_uploads/B1AY2ma93.jpg) ![](https://hackmd.io/_uploads/H1HchXp93.jpg) ![](https://hackmd.io/_uploads/Synsa7Tch.jpg) ![](https://hackmd.io/_uploads/rJrnp7T9h.jpg) ![](https://hackmd.io/_uploads/Sy03TXa5h.jpg) ![](https://hackmd.io/_uploads/ryrppQ6qh.jpg) ![](https://hackmd.io/_uploads/Bk0pTmT5n.jpg) ```python! 3 + 5*4 weight_kg = 60 weight_kg = 60.3 patient_id = '001' weight_lb = 2.2 * weight_kg patient_id = 'inflam_' + patient_id print(weight_lb) print(patient_id) print(patient_id, 'weight in kilograms', weight_kg) print(type(60.3)) print(type(patient_id)) print('weight in pounds:', 2.2*weight_kg) print("I say \'Hello\'") print(weight_lb) print(weight_kg) weight_lb = 2.2*weight_kg print(weight_lb) weight_kg = 65.0 print(weight_kg) print(weight_lb) weight_lb = 2.2*weight_kg print(weight_lb) ``` :::success :pencil: **Check Your Understanding** What values do the variables `mass` and `age` have after each of the following statements? Test your answer by executing the lines. ```python! mass = 47.5 age = 122 mass = mass * 2.0 age = age - 20 ``` > :::spoiler :eyes: ***Solution*** > ``` > `mass` holds a value of 47.5, `age` does not exist > `mass` still holds a value of 47.5, `age` holds a value of 122 > `mass` now has a value of 95.0, `age`'s value is still 122 > `mass` still has a value of 95.0, `age` now holds 102 > ``` ::: :::success :pencil: **Sorting Out References** Python allows you to assign multiple values to multiple variables in one line by separating the variables and values with commas. What does the following program print out? ```python! first, second = 'Grace', 'Hopper' third, fourth = second, first print(third, fourth) ``` > :::spoiler :eyes: ***Solution*** > ``` > Hopper Grace > ``` ::: ### 2. Analyzing Patient Data ![](https://hackmd.io/_uploads/Sy-LpXT5h.jpg) ![](https://hackmd.io/_uploads/ryqIaXTc3.jpg) ```python! import numpy numpy.loadtxt(fname="inflammation-01.csv", delimiter=",") data = numpy.loadtxt(fname="inflammation-01.csv", delimiter=",") print(data) print(type(data)) print(data.dtype) print(data.shape) print("first value in data:", data[0,0]) print("middle value in data:", data[29,19]) print(data[0:4,0:10]) print(data[5:10,0:10]) print(data[:3,36:]) print(data[:,36:]) small = data[:3,36:] print("small is:") print(small) print(data[0,36:]) first = data[0,:4] second = data[0,36:] print(first) print(second) print(numpy.concatenate([first,second])) print(numpy.mean(data)) import time print(time.ctime()) print(time.ctime) maxval, minval, stdval = numpy.amax(data), numpy.amin(data), numpy.std(data) print("Maximum inflammation:", maxval) print("Mininum inflammation:", minval) print("Standard deviation:", stdval) print(numpy.argmax(data)) print(numpy.argmax(data), axis=0) print(numpy.argmax(data), axis=1) print(numpy.concatenate([first,second])) numpy.amin? help(numpy.amin) help(numpy.min) print(data.min()) print(numpy.min(data)) patient_0 = data[0, :] print("maximum inflammation for patient 0:", numpy.amax(patient_0)) print("maximum inflammation for patient 0:", numpy.amax(data[0, :])) print(numpy.mean(data, axis=0)) print(numpy.mean(data, axis=0).shape) print(numpy.mean(data, axis=1)) print(numpy.mean(data, axis=1).shape) ``` :::success :pencil: **Slicing Strings** A section of an array is called a *slice*. We can take slices of character strings as well: ```python! element = 'oxygen' print('first three characters:', element[0:3]) print('last three characters:', element[3:6]) ``` ``` first three characters: oxy last three characters: gen ``` What is the value of `element[:4]`? What about `element[4:]`? Or `element[:]`? > :::spoiler :eyes: ***Solution*** > ``` > oxyg > en > oxygen > ``` What is `element[-1]`? What is `element[-2]`? > :::spoiler :eyes: ***Solution*** > ``` > n > e > ``` Given those answers, explain what `element[1:-1]` does. > :::spoiler :eyes: ***Solution*** > Creates a substring from index 1 up to (not including) the final index, > effectively removing the first and last letters from 'oxygen' How can we rewrite the slice for getting the last three characters of `element`, so that it works even if we assign a different string to `element`? Test your solution with the following strings: `carpentry`, `clone`, `hi`. > :::spoiler :eyes: ***Solution*** > ```python! > element = 'oxygen' > print('last three characters:', element[-3:]) > element = 'carpentry' > print('last three characters:', element[-3:]) > element = 'clone' > print('last three characters:', element[-3:]) > element = 'hi' > print('last three characters:', element[-3:]) > ``` > ``` > last three characters: gen > last three characters: try > last three characters: one > last three characters: hi > ``` ::: :::success :pencil: **Thin Slices** The expression `element[3:3]` produces an *empty string*, i.e., a string that contains no characters. If `data` holds our array of patient data, what does `data[3:3, 4:4]` produce? What about `data[3:3, :]`? > :::spoiler :eyes: ***Solution*** > ``` > array([], shape=(0, 0), dtype=float64) > array([], shape=(0, 40), dtype=float64) > ``` ::: :::success :pencil: **Stacking Arrays** Arrays can be concatenated and stacked on top of one another, using NumPy's `vstack` and `hstack` functions for vertical and horizontal stacking, respectively. ```python! import numpy A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]]) print('A = ') print(A) B = numpy.hstack([A, A]) print('B = ') print(B) C = numpy.vstack([A, A]) print('C = ') print(C) ``` ``` A = [[1 2 3] [4 5 6] [7 8 9]] B = [[1 2 3 1 2 3] [4 5 6 4 5 6] [7 8 9 7 8 9]] C = [[1 2 3] [4 5 6] [7 8 9] [1 2 3] [4 5 6] [7 8 9]] ``` Write some additional code that slices the first and last columns of `A`, and stacks them into a 3x2 array. Make sure to `print` the results to verify your solution. > :::spoiler :eyes: ***Solution*** > > A 'gotcha' with array indexing is that singleton dimensions > are dropped by default. That means `A[:, 0]` is a one dimensional > array, which won't stack as desired. To preserve singleton dimensions, > the index itself can be a slice or array. For example, `A[:, :1]` returns > a two dimensional array with one singleton dimension (i.e. a column > vector). > > ```python! > D = numpy.hstack((A[:, :1], A[:, -1:])) > print('D = ') > print(D) > ``` > > ``` > D = > [[1 3] > [4 6] > [7 9]] > ``` > :::spoiler :eyes: ***Solution*** > > An alternative way to achieve the same result is to use Numpy's > delete function to remove the second column of A. > > ```python! > D = numpy.delete(A, 1, 1) > print('D = ') > print(D) > ``` > > ``` > D = > [[1 3] > [4 6] > [7 9]] > ``` ::: :::success :pencil: **Change In Inflammation** The patient data is _longitudinal_ in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let's find out how to calculate changes in the data contained in an array with NumPy. The `numpy.diff()` function takes an array and returns the differences between two successive values. Let's use it to examine the changes each day across the first week of patient 3 from our inflammation dataset. ```python! patient3_week1 = data[3, :7] print(patient3_week1) ``` ``` [0. 0. 2. 0. 4. 2. 2.] ``` Calling `numpy.diff(patient3_week1)` would do the following calculations ```python! [ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ] ``` and return the 6 difference values in a new array. ```python! numpy.diff(patient3_week1) ``` ``` array([ 0., 2., -2., 4., -2., 0.]) ``` Note that the array of differences is shorter by one element (length 6). When calling `numpy.diff` with a multi-dimensional array, an `axis` argument may be passed to the function to specify which axis to process. When applying `numpy.diff` to our 2D inflammation array `data`, which axis would we specify? > :::spoiler :eyes: ***Solution*** > Since the row axis (0) is patients, it does not make sense to get the > difference between two arbitrary patients. The column axis (1) is in > days, so the difference is the change in inflammation -- a meaningful > concept. > > ```python! > numpy.diff(data, axis=1) > ``` If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns), what would the shape of the array be after you run the `diff()` function and why? > :::spoiler :eyes: ***Solution*** > The shape will be `(60, 39)` because there is one fewer difference between > columns than there are columns in the data. How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease? > :::spoiler :eyes: ***Solution*** > By using the `numpy.max()` function after you apply the `numpy.diff()` > function, you will get the largest difference between days. > > ```python! > numpy.max(numpy.diff(data, axis=1), axis=1) > ``` > > ```python! > array([ 7., 12., 11., 10., 11., 13., 10., 8., 10., 10., 7., > 7., 13., 7., 10., 10., 8., 10., 9., 10., 13., 7., > 12., 9., 12., 11., 10., 10., 7., 10., 11., 10., 8., > 11., 12., 10., 9., 10., 13., 10., 7., 7., 10., 13., > 12., 8., 8., 10., 10., 9., 8., 13., 10., 7., 10., > 8., 12., 10., 7., 12.]) > ``` > > If inflammation values *decrease* along an axis, then the difference from > one element to the next will be negative. If > you are interested in the **magnitude** of the change and not the > direction, the `numpy.absolute()` function will provide that. > > Notice the difference if you get the largest _absolute_ difference > between readings. > > ```python! > numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1) > ``` > > ```python! > array([ 12., 14., 11., 13., 11., 13., 10., 12., 10., 10., 10., > 12., 13., 10., 11., 10., 12., 13., 9., 10., 13., 9., > 12., 9., 12., 11., 10., 13., 9., 13., 11., 11., 8., > 11., 12., 13., 9., 10., 13., 11., 11., 13., 11., 13., > 13., 10., 9., 10., 10., 9., 9., 13., 10., 9., 10., > 11., 13., 10., 10., 12.]) > ``` > ::: ```python! #This is a comment print(""" This is a long comment with multiple lines """) ``` ### 3. Visualizing Tabular Data ```python! import matplotlib.pyplot image = matplotlib.pyplot.imshow(data) matplotlib.pyplot.show() ave_inflammation = numpy.mean(data, axis=0) ave_plot = matplotlib.pyplot.plot(ave_inflammation) matplotlib.pyplot.show() max_plot = matplotlib.pyplot.plot(numpy.amax(data, axis=0)) matplotlib.pyplot.show() min_plot = matplotlib.pyplot.plot(numpy.amin(data, axis=0)) matplotlib.pyplot.show() import numpy as np import matplotlib.pyplot as plt data = np.loadtxt(fname="inflammation-01.csv", delimiter=",") fig = plt.figure(figsize(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel("average") axes1.plot(numpy.mean(data, axis=0)) axes2.set_ylabel("max") axes2.plot(numpy.amax(data, axis=0)) axes3.set_ylabel("min") axes3.plot(numpy.amin(data, axis=0)) fig.tight_layout() plt.savefig("inflammation.png") plt.show() ``` :::success :pencil: **Plot Scaling** Why do all of our plots stop just short of the upper end of our graph? > :::spoiler :eyes: ***Solution*** > Because matplotlib normally sets x and y axes limits to the min and max of our data > (depending on data range) > > If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes', > for example: > > ```python! > axes3.set_ylim(0,6) > ``` Update your plotting code to automatically set a more appropriate scale. (Hint: you can make use of the `max` and `min` methods to help.) > :::spoiler :eyes: ***Solution*** > ```python! > # One method > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0)) > axes3.set_ylim(0,6) > ``` > :::spoiler :eyes: ***Solution*** > ```python! > # A more automated approach > min_data = numpy.min(data, axis=0) > axes3.set_ylabel('min') > axes3.plot(min_data) > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1) > ``` ::: :::success :pencil: **Drawing Straight Lines** In the center and right subplots above, we expect all lines to look like step functions because non-integer value are not realistic for the minimum and maximum values. However, you can see that the lines are not always vertical or horizontal, and in particular the step function in the subplot on the right looks slanted. Why is this? > :::spoiler :eyes: ***Solution*** > Because matplotlib interpolates (draws a straight line) between the points. > One way to do avoid this is to use the Matplotlib `drawstyle` option: > > ```python! > import numpy > import matplotlib.pyplot > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > axes1 = fig.add_subplot(1, 3, 1) > axes2 = fig.add_subplot(1, 3, 2) > axes3 = fig.add_subplot(1, 3, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid') > > axes2.set_ylabel('max') > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid') > > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid') > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` > ![Three line graphs, with step lines connecting the points, showing the daily average, maximum and minimum inflammation over a 40-day period.](../fig/inflammation-01-line-styles.svg) ::: :::success :pencil: **Make Your Own Plot** Create a plot showing the standard deviation (`numpy.std`) of the inflammation data for each day across all patients. > :::spoiler :eyes: ***Solution*** > ```python! > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0)) > matplotlib.pyplot.show() > ``` ::: :::success :pencil: **Moving Plots Around** Modify the program to display the three plots on top of one another instead of side by side. > :::spoiler :eyes: ***Solution*** > ```python! > import numpy > import matplotlib.pyplot > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > # change figsize (swap width and height) > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0)) > > # change add_subplot (swap first two parameters) > axes1 = fig.add_subplot(3, 1, 1) > axes2 = fig.add_subplot(3, 1, 2) > axes3 = fig.add_subplot(3, 1, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(data, axis=0)) > > axes2.set_ylabel('max') > axes2.plot(numpy.max(data, axis=0)) > > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0)) > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` ::: ### 4. Storing Multiple Values in Lists ![](https://hackmd.io/_uploads/BkQERmp9n.jpg) ```python! odds = [1, 3, 5, 7] print("odds are:", odds) print('First element' , odds[0]) print('Last element' , odds[3]) print('-1 element', odds[-1]) names = ["Curie", "Darwing", "Turing"] # Typo... print('names in the list', names) names[1] = "Darwin" print('final values of names', names) name = "Darwin" name[2] = 'd' # gives type error mild_salsa = ['peppers', 'unions', 'cilantro', 'tomatoes'] hot_salsa = mild_salsa hot_salsa[0] = 'hot peppers' print("Ingredients of the mild salsa", mild_salsa) print("Ingredients of the hot salsa", hot_salsa) hot_salsa[0] = 'mild peppers' print(mild_salsa) mild_salsa = ['peppers', 'unions', 'cilantro', 'tomatoes'] hot_salsa = list(mild_salsa) hot_salsa[0] = 'hot peppers' print("Mild", mild_salsa) print("Hot", hot_salsa) veg = [['lettuce', 'lettuce', 'peppers', 'zucchini'], ['lettuce', 'lettuce', 'peppers', 'zucchini'],['lettuce', 'cilantro', 'peppers', 'zucchini']] print(veg[2]) print(veg[0]) print(veg[0][0]) print(veg[1][2]) print(odds) odds.append(11) # add element print(odds) removed_element = odds.pop(0) print("Odds after removing the first element:", odds) print("Removed element", removed_element) odds.reverse() print("Odds after reverse", odds) odds = [3, 5, 7] primes = odds primes.append(2) print("primes", primes) print("odds", odds) odds = [3, 5, 7] primes = list(odds) primes.append(2) print("primes", primes) print("odds", odds) binomial_name = "Drosophila Melanogaster" group = binomial_name[0:10] print("group", group) species = binomial_name[11:23] print("species", species) chromosomes = ['X', 'Y', '1', '2', '3', '4'] autosomes = chromosomes[2:5] print("autosomes", autosomes) last = chromosomes[-1] print("last", last) ``` :::success :pencil: **Slicing From the End** Use slicing to access only the last four characters of a string or entries of a list. ```python! string_for_slicing = 'Observation date: 02-Feb-2013' list_for_slicing = [['fluorine', 'F'], ['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` ``` '2013' [['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` Would your solution work regardless of whether you knew beforehand the length of the string or list (e.g. if you wanted to apply the solution to a set of lists of different lengths)? If not, try to change your approach to make it more robust. Hint: Remember that indices can be negative as well as positive > :::spoiler :eyes: ***Solution*** > Use negative indices to count elements from the end of a container (such as list or string): > > ```python! > string_for_slicing[-4:] > list_for_slicing[-4:] > ``` ::: :::success :pencil: **Overloading** `+` usually means addition, but when used on strings or lists, it means "concatenate". Given that, what do you think the multiplication operator `*` does on lists? In particular, what will be the output of the following code? ```python! counts = [2, 4, 6, 8, 10] repeats = counts * 2 print(repeats) ``` 1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]` 2. `[4, 8, 12, 16, 20]` 3. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]` 4. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]` The technical term for this is *operator overloading*: a single operator, like `+` or `*`, can do different things depending on what it's applied to. > :::spoiler :eyes: ***Solution*** > > The multiplication operator `*` used on a list replicates elements of the list and concatenates > them together: > > ``` > [2, 4, 6, 8, 10, 2, 4, 6, 8, 10] > ``` > > It's equivalent to: > > ```python! > counts + counts > ``` ::: ```python! a = ['Hello'] print(a[0]) print(a[0][1:2]) ``` ### 5. Repeating Actions with Loops ![](https://hackmd.io/_uploads/rkJvCX6qn.jpg) ![](https://hackmd.io/_uploads/SyS9R7652.jpg) ![](https://hackmd.io/_uploads/Skf_Cma93.jpg) ![](https://hackmd.io/_uploads/SkyF0XTcn.jpg) ```python! odds = [1, 3, 5, 7] print(odds[0]) print(odds[1]) print(odds[2]) print(odds[3]) odds = [1, 3, 5] print(odds[0]) print(odds[1]) print(odds[2]) print(odds[3]) # index error odds = [1, 3, 5, 7] for num in odds: print(num) odds = [1, 3, 5, 7, 11] for num in odds: print(num) odds = [1, 3, 5, 7, 11] for banana in odds: print(banana) for num in odds[:2] print(num) length = 0 names = ["Curie", "Darwin", "Turing"] for value in names: length = length + 1 print("There are ", length, "names in the list") # ln = 0 names = ["Curie", "Darwin", "Turing"] for value in names: ln = ln + 1 print("There are ", ln, "names in the list") # ln not defined name = "Rosalind" for name in ["Curie", "Darwin", "Turing"]: print(name) print("after the lopp name is", name) ``` :::success :pencil: **From 1 to N** Python has a built-in function called `range` that generates a sequence of numbers. `range` can accept 1, 2, or 3 parameters. * If one parameter is given, `range` generates a sequence of that length, starting at zero and incrementing by 1. For example, `range(3)` produces the numbers `0, 1, 2`. * If two parameters are given, `range` starts at the first and ends just before the second, incrementing by one. For example, `range(2, 5)` produces `2, 3, 4`. * If `range` is given 3 parameters, it starts at the first one, ends just before the second one, and increments by the third one. For example, `range(3, 10, 2)` produces `3, 5, 7, 9`. Using `range`, write a loop that uses `range` to print the first 3 natural numbers: ```python! 1 2 3 ``` > :::spoiler :eyes: ***Solution*** > ```python! > for number in range(1, 4): > print(number) > ``` ::: :::success :pencil: **Understanding the loops** Given the following loop: ```python! word = 'oxygen' for char in word: print(char) ``` How many times is the body of the loop executed? * 3 times * 4 times * 5 times * 6 times > :::spoiler :eyes: ***Solution*** > > The body of the loop is executed 6 times. > ::: :::success :pencil: **Computing Powers With Loops** Exponentiation is built into Python: ```python! print(5 ** 3) ``` ``` 125 ``` Write a loop that calculates the same result as `5 ** 3` using multiplication (and without exponentiation). > :::spoiler :eyes: ***Solution*** > ```python! > result = 1 > for number in range(0, 3): > result = result * 5 > print(result) > ``` ::: :::success :pencil: **Summing a list** Write a loop that calculates the sum of elements in a list by adding each element and printing the final value, so `[124, 402, 36]` prints 562 > :::spoiler :eyes: ***Solution*** > ```python! > numbers = [124, 402, 36] > summed = 0 > for num in numbers: > summed = summed + num > print(summed) > ``` ::: ### 6. Analyzing Data from Multiple Files ![](https://hackmd.io/_uploads/Bk8hCQaqh.jpg) ![](https://hackmd.io/_uploads/rJ7aRQac2.jpg) ![](https://hackmd.io/_uploads/BysTCmach.jpg) ```python! import glob print(glob.glob('inflammation*.csv')) import numpy import matplotlib.pyplot filenames = sorted(glob.glob('inflammation*.csv')) filenames = filenames[0:3] for filename in filenames: print(filename) data = numpy.loadtxt(fname=filename, delimiter=",") fig = matplotlib.pyplot.figure(figsize=(10.0,30.)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('average') axes1.plot(numpy.mean(data, axis=0)) axes2.set_ylabel('max') axes2.plot(numpy.amax(data, axis=0)) axes3.set_ylabel('min') axes3.plot(numpy.amin(data, axis=0)) fig.tight_layout() matplotlib.pyplot.show() ``` :::success :pencil: **Plotting Differences** Plot the difference between the average inflammations reported in the first and second datasets (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly), i.e., the difference between the leftmost plots of the first two figures. > :::spoiler :eyes: ***Solution*** > ```python! > import glob > import numpy > import matplotlib.pyplot > > filenames = sorted(glob.glob('inflammation*.csv')) > > data0 = numpy.loadtxt(fname=filenames[0], delimiter=',') > data1 = numpy.loadtxt(fname=filenames[1], delimiter=',') > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > matplotlib.pyplot.ylabel('Difference in average') > matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0)) > > fig.tight_layout() > matplotlib.pyplot.show() > ``` ::: :::success :pencil: **Generate Composite Statistics** Use each of the files once to generate a dataset containing values averaged over all patients: ```python! filenames = glob.glob('inflammation*.csv') composite_data = numpy.zeros((60,40)) for filename in filenames: # sum each new file's data into composite_data as it's read # # and then divide the composite_data by number of samples composite_data = composite_data / len(filenames) ``` Then use pyplot to generate average, max, and min for all patients. > :::spoiler :eyes: ***Solution*** > ```python! > import glob > import numpy > import matplotlib.pyplot > > filenames = glob.glob('inflammation*.csv') > composite_data = numpy.zeros((60,40)) > > for filename in filenames: > data = numpy.loadtxt(fname = filename, delimiter=',') > composite_data = composite_data + data > > composite_data = composite_data / len(filenames) > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > axes1 = fig.add_subplot(1, 3, 1) > axes2 = fig.add_subplot(1, 3, 2) > axes3 = fig.add_subplot(1, 3, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(composite_data, axis=0)) > > axes2.set_ylabel('max') > axes2.plot(numpy.max(composite_data, axis=0)) > > axes3.set_ylabel('min') > axes3.plot(numpy.min(composite_data, axis=0)) > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` ::: ### 7. Making Choices ![](https://hackmd.io/_uploads/Sk5Jk4aqn.jpg) ![](https://hackmd.io/_uploads/HJxgyVa5n.jpg) ![](https://hackmd.io/_uploads/rJOxk4acn.jpg) ```python! num = 37 if num > 100: print("Greater") else: print("Not greater") print("Done") num = 53 print("Before conditional...") if num > 100: print(num, "is greater than 100") print("...after conditional") num = -3 if num > 0: print(num, "is positive") elif num == 0: print(num, "is zero") else: print(num, "is negative") if (1 > 0) and (-1 >= 0): print("Both are true") else: print("At least one part is false") if (1 < 0) or (1 >= 0): print("At least one test is true") import numpy data = np.loadtxt("inflammation-01.csv", delimiter=",") max_inflammation_0 = numpy.amax(data,axis=0)[0] max_inflammation_20 = numpy.amin(data,axis=0)[20] if max_inflammation_0 == 0 and max_inflammation_20 == 20: print("Suspicious looking maxima") elif numpy.sum(numpy.amin(data,axis=0)) == 0: print("Minima add up to 0") else: print("Seems OK!") ``` :::success :pencil: **How Many Paths?** Consider this code: ```python! if 4 > 5: print('A') elif 4 == 5: print('B') elif 4 < 5: print('C') ``` Which of the following would be printed if you were to run this code? Why did you pick this answer? 1. A 2. B 3. C 4. B and C > :::spoiler :eyes: ***Solution*** > C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true, > but `4 < 5` is true. ::: :::success :pencil: **What Is Truth?** `True` and `False` booleans are not the only values in Python that are true and false. In fact, *any* value can be used in an `if` or `elif`. After reading and running the code below, explain what the rule is for which values are considered true and which are considered false. ```python! if '': print('empty string is true') if 'word': print('word is true') if []: print('empty list is true') if [1, 2, 3]: print('non-empty list is true') if 0: print('zero is true') if 1: print('one is true') ``` ::: :::success :pencil: **That's Not Not What I Meant** Sometimes it is useful to check whether some condition is not true. The Boolean operator `not` can do this explicitly. After reading and running the code below, write some `if` statements that use `not` to test the rule that you formulated in the previous challenge. ```python! if not '': print('empty string is not true') if not 'word': print('word is not true') if not not True: print('not not True is true') ``` ::: :::success :pencil: **Close Enough** Write some conditions that print `True` if the variable `a` is within 10% of the variable `b` and `False` otherwise. Compare your implementation with your partner's: do you get the same answer for all possible pairs of numbers? > :::spoiler :eyes: ***Solution*** > There is a [built-in function `abs`][abs-function] that returns the absolute value of > a number: > ```python! > print(abs(-12)) > ``` > ``` > 12 > ``` > :::spoiler :eyes: ***Solution*** > ```python! > a = 5 > b = 5.1 > > if abs(a - b) <= 0.1 * abs(b): > print('True') > else: > print('False') > ``` > :::spoiler :eyes: ***Solution*** > ```python! > print(abs(a - b) <= 0.1 * abs(b)) > ``` > > This works because the Booleans `True` and `False` > have string representations which can be printed. ::: ### 8. Creating Functions ```python! fahrenheit_val = 99 celsius_val = ((fahrenheit_val -32)*(5/9)) print(celsius_val) fahrenheit_val = 49 celsius_val = ((fahrenheit_val -32)*(5/9)) print(celsius_val) def fahr_to_celsius(temp): return ((temp -32) * (5/9)) fahr_to_celsius(99) fahr_to_celsius(32) def fahr_to_celsius(temp): result = ((temp -32) * (5/9)) return result fahr_to_celsius(99) print(fahr_to_celsius(99)) temp_fahr = 32 temp_cels = fahr_to_celsius(temp_fahr) print(temp_cels) print("freezing point of water", fahr_to_celsius(32), "C") print("boiling point of water", fahr_to_celsius(212), "C") def celsius_to_kelvin(temp_c): return temp_c - 278.15 print("freezing point of water in Kelvin:", celsius_to_kelvin(0.), "K") def fahr_to_kelvin(temp_f): temp_c = fahr_to_celsius(temp_f) temp_k = celsius_to_kelvin(temp_c) return temp_k print("boiling point of water in Kelvin", fahr_to_kelvin(212.0),"K") print("Again, temperature in Kelvin was:" temp_k) # temp_k not defined temp_kelvin = fahr_to_kelvin(212.) print("temperature in Kelvin was:" temp_kelvin, "K") def print_temperatures(): print("temperature in fahrenheit was:", "temp_fahr") print("temperature in Kelvin was:", "temp_kelvin") temp_fahr = 212.0 temp_kelvin = fahr_to_kelvin(temp_fahr) print_temperatures() import numpy as np import matplotlib.pyplot as plt def visualise(filename): data = np.loadtxt(fname=filename, delimiter=",") fig = plt.figure(figsize=(10.,3.)) axes1 = fig.add_subplot(1,3,1) axes2 = fig.add_subplot(1,3,2) axes3 = fig.add_subplot(1,3,3) axes1.set_ylabel("average") axes1.plot(np.mean(data,axis=0)) axes2.set_ylabel("max") axes2.plot(np.amax(data,axis=0)) axes3.set_ylabel("min") axes3.plot(np.amin(data,axis=0)) fig.tight_layout() plt.show() def detect_problems(filename): data = np.loadtxt(fname=filename, delimiter=",") if np.amax(data, axis=0)[0] == 0 and np.amax(data, axis=0)[20] == 20): print("Suspicious looking maxima") elif np.sum(np.amin(data, axis=0)) == 0: print("Minima add up to zero") else: print("Seems OK!") filenames = sorted(glob.glob("inflammation*.csv")) for filename in filenames: print(filename) visualise(filename) detect_problems(filename) def offset_mean(data, target_mean_value): return (data = numpy.mean(data)) + target_mean_value z = np.zeros((2,2)) print(ofset_means(z,3)) data = np.loadtxt(fname="inflammation-01.csv", delimiter = ",") print(offset_mean(data,0)) print("original min, mean and max are:", np.amin(data), np.mean(data), np.amax(data)) offset_data = offset_mean(data,0) print("min, mean, max of offset data are:", np.amin(offset_data), np.mean(offset_data), np.amax(offset_data), ) print("std dev before and after", np.std(data), np.std(offset_data)) print("difference in standard deviation before and after is:", np.std(data) - np.std(offset_data)) # offset_mean(data, target_mean_value) # return a new array containing the original data with its mean offset to match the desired value def offset_mean(data, target_mean_value): return (data = numpy.mean(data)) + target_mean_value def offset_mean(data, target_mean_value): """ Return a new array containing the original data with its mean offset to match the desired value. """ return (data = numpy.mean(data)) + target_mean_value help(offset_mean) def offset_mean(data, target_mean_value): """ Return a new array containing the original data with its mean offset to match the desired value. Examples -------- >>> offset_mean([1,2,3], 0) array([-1,0,1]) """ return (data = numpy.mean(data)) + target_mean_value help(offset_mean) type(data) np.loadtxt(fname="something", delimiter=",") np.loadtxt("inflammation-01.csv", delimiter=",") np.loadtxt("inflammation-01.csv",",") #error def offset_mean(data, target_mean_value=0.0): """ Return a new array containing the original data with its mean offset to match the desired value. Examples -------- >>> offset_mean([1,2,3], 0) array([-1,0,1]) """ return (data = numpy.mean(data)) + target_mean_value test_data = np.zeros((2,2)) print(offset_mean(test_data,3)) more_data = 5 + np.zeos((2,2)) print("data before mean offset:") print(more_data) print("offset data") print(offset_mean(more_data)) def display(a=1, b=2, c= 3): print("a:",a,"b:",b,"c:",c) print("no parameters:") display() print("one parameters:") display(55) print("two parameters:") display(55,66) print("Only set the value of c:") display(c=77) help(np.loadtxt) def s(p): a = 0 for v in p: a += v m = a/len(p) d = 0 for v in p: d += (v - m) * (v - m) return np.sqrt(d/(len(p)-1)) def std_dev(sample): sample_sum = 0 for value in sample: sample_sum += value sample_mean = sample_sum/len(sample) sum_squared_devs = 0 for value in sample: sum_squared_devs += (value - sample_mean) * (value - sample_mean) return np.sqrt(sum_squared_devs/(len(sample)-1)) a = 1+6j type(a) a = 5j type(a) a * 5 ``` :::success :pencil: **Combining Strings** "Adding" two strings produces their concatenation: `'a' + 'b'` is `'ab'`. Write a function called `fence` that takes two parameters called `original` and `wrapper` and returns a new string that has the wrapper character at the beginning and end of the original. A call to your function should look like this: ```python! print(fence('name', '*')) ``` ``` *name* ``` > :::spoiler :eyes: ***Solution*** > ```python! > def fence(original, wrapper): > return wrapper + original + wrapper > ``` ::: :::success :pencil: **Return versus print** Note that `return` and `print` are not interchangeable. `print` is a Python function that *prints* data to the screen. It enables us, *users*, see the data. `return` statement, on the other hand, makes data visible to the program. Let's have a look at the following function: ```python! def add(a, b): print(a + b) ``` **Question**: What will we see if we execute the following commands? ```python! A = add(7, 3) print(A) ``` > :::spoiler :eyes: ***Solution*** > Python will first execute the function `add` with `a = 7` and `b = 3`, > and, therefore, print `10`. However, because function `add` does not have a > line that starts with `return` (no `return` "statement"), it will, by default, return > nothing which, in Python world, is called `None`. Therefore, `A` will be assigned to `None` > and the last line (`print(A)`) will print `None`. As a result, we will see: > ``` > 10 > None > ``` ::: :::success :pencil: **Selecting Characters From Strings** If the variable `s` refers to a string, then `s[0]` is the string's first character and `s[-1]` is its last. Write a function called `outer` that returns a string made up of just the first and last characters of its input. A call to your function should look like this: ```python! print(outer('helium')) ``` ``` hm ``` > :::spoiler :eyes: ***Solution*** > ```python! > def outer(input_string): > return input_string[0] + input_string[-1] > ``` ::: :::success :pencil: **Rescaling an Array** Write a function `rescale` that takes an array as input and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0. (Hint: If `L` and `H` are the lowest and highest values in the original array, then the replacement for a value `v` should be `(v-L) / (H-L)`.) > :::spoiler :eyes: ***Solution*** > ```python! > def rescale(input_array): > L = numpy.min(input_array) > H = numpy.max(input_array) > output_array = (input_array - L) / (H - L) > return output_array > ``` ::: :::success :pencil: **Variables Inside and Outside Functions** What does the following piece of code display when run --- and why? ```python! f = 0 k = 0 def f2k(f): k = ((f - 32) * (5.0 / 9.0)) + 273.15 return k print(f2k(8)) print(f2k(41)) print(f2k(32)) print(k) ``` > :::spoiler :eyes: ***Solution*** > > ``` > 259.81666666666666 > 278.15 > 273.15 > 0 > ``` > `k` is 0 because the `k` inside the function `f2k` doesn't know > about the `k` defined outside the function. When the `f2k` function is called, > it creates a [local variable]({{ page.root }}/reference.html#local-variable) > `k`. The function does not return any values > and does not alter `k` outside of its local copy. > Therefore the original value of `k` remains unchanged. > Beware that a local `k` is created because `f2k` internal statements > *affect* a new value to it. If `k` was only `read`, it would simply retrieve the > global `k` value. ::: :::success :pencil: **Mixing Default and Non-Default Parameters** Given the following code: ```python! def numbers(one, two=2, three, four=4): n = str(one) + str(two) + str(three) + str(four) return n print(numbers(1, three=3)) ``` what do you expect will be printed? What is actually printed? What rule do you think Python is following? 1. `1234` 2. `one2three4` 3. `1239` 4. `SyntaxError` Given that, what does the following piece of code display when run? ```python! def func(a, b=3, c=6): print('a: ', a, 'b: ', b, 'c:', c) func(-1, 2) ``` 1. `a: b: 3 c: 6` 2. `a: -1 b: 3 c: 6` 3. `a: -1 b: 2 c: 6` 4. `a: b: -1 c: 2` > :::spoiler :eyes: ***Solution*** > Attempting to define the `numbers` function results in `4. SyntaxError`. > The defined parameters `two` and `four` are given default values. Because > `one` and `three` are not given default values, they are required to be > included as arguments when the function is called and must be placed > before any parameters that have default values in the function definition. > > The given call to `func` displays `a: -1 b: 2 c: 6`. -1 is assigned to > the first parameter `a`, 2 is assigned to the next parameter `b`, and `c` is > not passed a value, so it uses its default value 6. ::: ### 9. Errors and Exceptions ```python! def favorite_ice_cream(): ice_creams = ['chocolate', 'vanilla', 'strawberry'] print(ice_creams[3]) favorite_ice_cream() def some_function() msg = "hello world!" print(msg) return msg def some_function(): msg = "hello world!" print(msg) return msg def some_function(): msg = "hello world!" print(msg) return msg print(t) print(hello) print("hello") for number in range(10): count =count + number print("the count is:", count) Count = 0; for number in range(10): count =count + number print("the count is:", count) file_handle = open("myfile.txt", "r") file_handle = open("myfile.txt", "w") file_handle.read() numbers = [1.5,2.3,0.7,-0.002,4.4] total = 0.0 for num in numbers: assert num > 0.0, "data should only contain positive values" total = num print("total is:", total) def normalize_rectangles(rect): assert len(rect) == 4, "rectangles must contain 4 co-ordinates" x0, y0, x1, y1 = rect assert x0 < x1, " Invalid X coordinates" assert y0 < y1, " Invalid Y coordinates" dx = x1 -x0 dy = y1 -y0 if dx > dy: scaled = dx/dy upper_x, upper_y = 1.0, scaled else: scaled = dy/dx upper_x, upper_y = scaled, 1.0 assert 0 < upper_x <= 1.0, "calculated x coordinate invalid" assert 0 < upper_y <= 1.0, "calculated y coordinate invalid" return (0, 0, upper_x, upper_y) print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted print(normalize_rectangle( (0.0, 0.0, 5.0, 1.0) )) # too wide print(normalize_rectangle( (0.0, 0.0, 2.0, 2.0) )) ```