NWO-I Software Carpentry 2022 - HackMD

<style>body { background-color: #eeeeee!important; } </style> # NWO-I Software Carpentry 2022 :::info :information_source: On this page you will find notes for the NWO-I Software Carpentry workshop organized on November 28 and February 6. ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :timer_clock: Schedule February 6 | | **Programming with Python**| |------|------| | 09:30 | Programming with Python | | 10:30 | *Morning break* | | 10:45 | Programming with Python (Continued) | | 12:30 | *Lunch break* | | 13:15 | Programming with Python (Continued) | | 15:30 | *Afternoon break* | | 15:45 | Programming with Python (Continued) | | 17:30 | *END* | ## Programming with Python ### :link: Links * Setup page: https://swcarpentry.github.io/python-novice-inflammation/setup.html * Lesson material: https://swcarpentry.github.io/python-novice-inflammation/ * Reference page: https://swcarpentry.github.io/python-novice-inflammation/reference.html * Post workshop survey: https://carpentries.typeform.com/to/UgVdRQ?slug=2022-11-28-software-carpentry ### 1. Python Fundamentals ```python! 3 + 5 * 4 weight_kg = 60 print(weight_kg) weight_lb = 2.2 * weight_kg print(weight_lb) patient_id = '001' print(patient_id) weight_kg = 60.3 print(weight_kg) print(weight_lb) weight_lb = 2.2 * weight_kg print(weight_lb) print(patient_id, 'weight in kilogram', weight_kg) print(type(60.3)) print(type(patient_id)) weight_kg = 65.0 print('weight in kilograms is now: ', weight_kg) weight_lb = 2.2 * weight_kg print('weight in kilograms: ', weight_kg, 'and in pounds: ', weight_lb) weight_kg = 100.0 print('weight in kilograms: ', weight_kg, 'and in pounds: ', weight_lb) ``` :::success :pencil: **Check Your Understanding** What values do the variables `mass` and `age` have after each of the following statements? Test your answer by executing the lines. ```python! mass = 47.5 age = 122 mass = mass * 2.0 age = age - 20 ``` > :::spoiler :eyes: ***Solution*** > ``` > `mass` holds a value of 47.5, `age` does not exist > `mass` still holds a value of 47.5, `age` holds a value of 122 > `mass` now has a value of 95.0, `age`'s value is still 122 > `mass` still has a value of 95.0, `age` now holds 102 > ``` ::: :::success :pencil: **Sorting Out References** Python allows you to assign multiple values to multiple variables in one line by separating the variables and values with commas. What does the following program print out? ```python! first, second = 'Grace', 'Hopper' third, fourth = second, first print(third, fourth) ``` > :::spoiler :eyes: ***Solution*** > ``` > Hopper Grace > ``` ::: :::success :pencil: **Seeing Data Types** What are the data types of the following variables? ```python! planet = 'Earth' apples = 5 distance = 10.5 ``` > :::spoiler :eyes: ***Solution*** > ```python! > print(type(planet)) > print(type(apples)) > print(type(distance)) > ``` > > ``` > <class 'str'> > <class 'int'> > <class 'float'> > ``` ::: ### 2. Analyzing Patient Data ```python! import numpy numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') print(data) print(type(data)) print(data.dtype) print(data.shape) print('first value in data:',data[0,0]) print('middle value in data:',data[30,20]) print(data[0:4,0:10]) print(data[5:10,36:]) print(data[:3,36:]) print(numpy.mean(data)) import time print(time.ctime()) print('maximum inflammation: ', maxval) print('minimum inflammation: ', minval) print('standard deviation: ', stdval) numpy.std? help(numpy.std) print(data.shape) patient_0 = data [0,:] # 0 on the first axis (rows), everything on the second (columns) print("maximum inflammation for patient 0: ", numpy.max(patient_0)) print("maximum inflammation for patient 2: ", numpy.max(data [2,:])) print(numpy.mean(data, axis=0)) print(numpy.mean(data, axis=0).shape) print(numpy.mean(data, axis=1)) ``` :::success :pencil: **Slicing Strings** A section of an array is called a *slice*. We can take slices of character strings as well: ```python! element = 'oxygen' print('first three characters:', element[0:3]) print('last three characters:', element[3:6]) ``` ``` first three characters: oxy last three characters: gen ``` What is the value of `element[:4]`? What about `element[4:]`? Or `element[:]`? > :::spoiler :eyes: ***Solution*** > ``` > oxyg > en > oxygen > ``` What is `element[-1]`? What is `element[-2]`? > :::spoiler :eyes: ***Solution*** > ``` > n > e > ``` Given those answers, explain what `element[1:-1]` does. > :::spoiler :eyes: ***Solution*** > Creates a substring from index 1 up to (not including) the final index, > effectively removing the first and last letters from 'oxygen' How can we rewrite the slice for getting the last three characters of `element`, so that it works even if we assign a different string to `element`? Test your solution with the following strings: `carpentry`, `clone`, `hi`. > :::spoiler :eyes: ***Solution*** > ```python! > element = 'oxygen' > print('last three characters:', element[-3:]) > element = 'carpentry' > print('last three characters:', element[-3:]) > element = 'clone' > print('last three characters:', element[-3:]) > element = 'hi' > print('last three characters:', element[-3:]) > ``` > ``` > last three characters: gen > last three characters: try > last three characters: one > last three characters: hi > ``` :::  :::success :pencil: **Stacking Arrays** Arrays can be concatenated and stacked on top of one another, using NumPy's `vstack` and `hstack` functions for vertical and horizontal stacking, respectively. ```python! import numpy A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]]) print('A = ') print(A) B = numpy.hstack([A, A]) print('B = ') print(B) C = numpy.vstack([A, A]) print('C = ') print(C) ``` ``` A = [[1 2 3] [4 5 6] [7 8 9]] B = [[1 2 3 1 2 3] [4 5 6 4 5 6] [7 8 9 7 8 9]] C = [[1 2 3] [4 5 6] [7 8 9] [1 2 3] [4 5 6] [7 8 9]] ``` Write some additional code that slices the first and last columns of `A`, and stacks them into a 3x2 array. Make sure to `print` the results to verify your solution. > :::spoiler :eyes: ***Solution*** > > A 'gotcha' with array indexing is that singleton dimensions > are dropped by default. That means `A[:, 0]` is a one dimensional > array, which won't stack as desired. To preserve singleton dimensions, > the index itself can be a slice or array. For example, `A[:, :1]` returns > a two dimensional array with one singleton dimension (i.e. a column > vector). > > ```python! > D = numpy.hstack((A[:, :1], A[:, -1:])) > print('D = ') > print(D) > ``` > > ``` > D = > [[1 3] > [4 6] > [7 9]] > ``` > :::spoiler :eyes: ***Solution*** > > An alternative way to achieve the same result is to use Numpy's > delete function to remove the second column of A. > > ```python! > D = numpy.delete(A, 1, 1) > print('D = ') > print(D) > ``` > > ``` > D = > [[1 3] > [4 6] > [7 9]] > ``` ::: :::success :pencil: **Change In Inflammation** The patient data is _longitudinal_ in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let's find out how to calculate changes in the data contained in an array with NumPy. The `numpy.diff()` function takes an array and returns the differences between two successive values. Let's use it to examine the changes each day across the first week of patient 3 from our inflammation dataset. ```python! patient3_week1 = data[3, :7] print(patient3_week1) ``` ``` [0. 0. 2. 0. 4. 2. 2.] ``` Calling `numpy.diff(patient3_week1)` would do the following calculations ```python! [ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ] ``` and return the 6 difference values in a new array. ```python! numpy.diff(patient3_week1) ``` ``` array([ 0., 2., -2., 4., -2., 0.]) ``` Note that the array of differences is shorter by one element (length 6). When calling `numpy.diff` with a multi-dimensional array, an `axis` argument may be passed to the function to specify which axis to process. When applying `numpy.diff` to our 2D inflammation array `data`, which axis would we specify? > :::spoiler :eyes: ***Solution*** > Since the row axis (0) is patients, it does not make sense to get the > difference between two arbitrary patients. The column axis (1) is in > days, so the difference is the change in inflammation -- a meaningful > concept. > > ```python! > numpy.diff(data, axis=1) > ``` If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns), what would the shape of the array be after you run the `diff()` function and why? > :::spoiler :eyes: ***Solution*** > The shape will be `(60, 39)` because there is one fewer difference between > columns than there are columns in the data. How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease? > :::spoiler :eyes: ***Solution*** > By using the `numpy.max()` function after you apply the `numpy.diff()` > function, you will get the largest difference between days. > > ```python! > numpy.max(numpy.diff(data, axis=1), axis=1) > ``` > > ```python! > array([ 7., 12., 11., 10., 11., 13., 10., 8., 10., 10., 7., > 7., 13., 7., 10., 10., 8., 10., 9., 10., 13., 7., > 12., 9., 12., 11., 10., 10., 7., 10., 11., 10., 8., > 11., 12., 10., 9., 10., 13., 10., 7., 7., 10., 13., > 12., 8., 8., 10., 10., 9., 8., 13., 10., 7., 10., > 8., 12., 10., 7., 12.]) > ``` > > If inflammation values *decrease* along an axis, then the difference from > one element to the next will be negative. If > you are interested in the **magnitude** of the change and not the > direction, the `numpy.absolute()` function will provide that. > > Notice the difference if you get the largest _absolute_ difference > between readings. > > ```python! > numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1) > ``` > > ```python! > array([ 12., 14., 11., 13., 11., 13., 10., 12., 10., 10., 10., > 12., 13., 10., 11., 10., 12., 13., 9., 10., 13., 9., > 12., 9., 12., 11., 10., 13., 9., 13., 11., 11., 8., > 11., 12., 13., 9., 10., 13., 11., 11., 13., 11., 13., > 13., 10., 9., 10., 10., 9., 9., 13., 10., 9., 10., > 11., 13., 10., 10., 12.]) > ``` > ::: ### 3. Visualizing Tabular Data ```python! import numpy data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') import matplotlib.pyplot image = matplotlib.pyplot.imshow(data) matplotlib.pyplot.show() ave_imflammation = numpy.mean(data,axis=0) ave_plot = matplotlib.pyplot.plot(ave_imflammation) matplotlib.pyplot.show() max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0)) matplotlib.pyplot.show() min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0)) matplotlib.pyplot.show() import numpy import matplotlib.pyplot data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('average') axes1.plot(numpy.mean(data, axis=0)) axes2.set_ylabel('max') axes2.plot(numpy.max(data, axis=0)) axes3.set_ylabel('min') axes3.plot(numpy.min(data, axis=0)) fig.tight_layout() matplotlib.pyplot.savefig('inflammation.png') matplotlib.pyplot.show() import numpy as np np.max(data) import matplotlib.pyplot as plt plt.imshow(data) plt.show() ``` :::success :pencil: **Plot Scaling** Why do all of our plots stop just short of the upper end of our graph? > :::spoiler :eyes: ***Solution*** > Because matplotlib normally sets x and y axes limits to the min and max of our data > (depending on data range) > > If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes', > for example: > > ```python! > axes3.set_ylim(0,6) > ``` Update your plotting code to automatically set a more appropriate scale. (Hint: you can make use of the `max` and `min` methods to help.) > :::spoiler :eyes: ***Solution*** > ```python! > # One method > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0)) > axes3.set_ylim(0,6) > ``` > :::spoiler :eyes: ***Solution*** > ```python! > # A more automated approach > min_data = numpy.min(data, axis=0) > axes3.set_ylabel('min') > axes3.plot(min_data) > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1) > ``` ::: :::success :pencil: **Drawing Straight Lines** In the center and right subplots above, we expect all lines to look like step functions because non-integer value are not realistic for the minimum and maximum values. However, you can see that the lines are not always vertical or horizontal, and in particular the step function in the subplot on the right looks slanted. Why is this? > :::spoiler :eyes: ***Solution*** > Because matplotlib interpolates (draws a straight line) between the points. > One way to do avoid this is to use the Matplotlib `drawstyle` option: > > ```python! > import numpy > import matplotlib.pyplot > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > axes1 = fig.add_subplot(1, 3, 1) > axes2 = fig.add_subplot(1, 3, 2) > axes3 = fig.add_subplot(1, 3, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid') > > axes2.set_ylabel('max') > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid') > > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid') > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` > ![Three line graphs, with step lines connecting the points, showing the daily average, maximum and minimum inflammation over a 40-day period.](../fig/inflammation-01-line-styles.svg) ::: :::success :pencil: **Make Your Own Plot** Create a plot showing the standard deviation (`numpy.std`) of the inflammation data for each day across all patients. > :::spoiler :eyes: ***Solution*** > ```python! > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0)) > matplotlib.pyplot.show() > ``` ::: :::success :pencil: **Moving Plots Around** Modify the program to display the three plots on top of one another instead of side by side. > :::spoiler :eyes: ***Solution*** > ```python! > import numpy > import matplotlib.pyplot > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > # change figsize (swap width and height) > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0)) > > # change add_subplot (swap first two parameters) > axes1 = fig.add_subplot(3, 1, 1) > axes2 = fig.add_subplot(3, 1, 2) > axes3 = fig.add_subplot(3, 1, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(data, axis=0)) > > axes2.set_ylabel('max') > axes2.plot(numpy.max(data, axis=0)) > > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0)) > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` ::: ### 4. Storing Multiple Values in Lists ```python! odds = [1, 3, 5, 7] print('odds are:', odds) print('first element:', odds[0]) print('last element:', odds[3]) print('"-1" element:', odds[-1]) names = ['Curie','Darwing','Turing'] # Typo in Darwin's name print('names is originally: ', names) names[1] = 'Darwin' # Correct the name print('final value of names', names) name = 'Darwin' name[0] = 'd' salsa = ['peppers', 'unions', 'cilantro', 'tomatoes'] my_salsa = salsa salsa[0] = 'hot peppers' print('Ingredients in my salsa', my_salsa) salsa = ['peppers', 'unions', 'cilantro', 'tomatoes'] my_salsa = list(salsa) salsa[0] = 'hot peppers' print('Ingredients in my salsa', my_salsa) x = [['pepper', 'zucchini', 'union'], ['cabbage','lettuce','garlic'],['apple','pear','banana']] print([x[0]]) print(x[0]) print(x[0][0]) sample_age = [10, 12.5, 'unknown'] print(sample_age) odds.append(11) print('odds after append: ', odds) removed_element = odds.pop(0) print('odds after removing first element: ', odds) print('removed element: ', removed_element) odds.reverse() print('odds after reverse', odds) odds = [3,5,7] primes = odds primes.append(2) print('primes: ', primes) print('odds: ', odds) primes.insert(2, 13) print(primes) binomial_name = 'Drosophila melanogaster' group = binomial_name[0:10] print('group:', group) species = binomial_name[11:23] print('species:', species) chromosomes = ['X', 'Y', '2', '3', '4'] autosomes = chromosomes[2:5] print('autosomes:', autosomes) last = chromosomes[-1] print('last:', last) ``` :::success :pencil: **Slicing From the End** Use slicing to access only the last four characters of a string or entries of a list. ```python! string_for_slicing = 'Observation date: 02-Feb-2013' list_for_slicing = [['fluorine', 'F'], ['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` ``` '2013' [['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` Would your solution work regardless of whether you knew beforehand the length of the string or list (e.g. if you wanted to apply the solution to a set of lists of different lengths)? If not, try to change your approach to make it more robust. Hint: Remember that indices can be negative as well as positive > :::spoiler :eyes: ***Solution*** > Use negative indices to count elements from the end of a container (such as list or string): > > ```python! > string_for_slicing[-4:] > list_for_slicing[-4:] > ``` ::: :::success :pencil: **Overloading** `+` usually means addition, but when used on strings or lists, it means "concatenate". Given that, what do you think the multiplication operator `*` does on lists? In particular, what will be the output of the following code? ```python! counts = [2, 4, 6, 8, 10] repeats = counts * 2 print(repeats) ``` 1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]` 2. `[4, 8, 12, 16, 20]` 3. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]` 4. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]` The technical term for this is *operator overloading*: a single operator, like `+` or `*`, can do different things depending on what it's applied to. > :::spoiler :eyes: ***Solution*** > > The multiplication operator `*` used on a list replicates elements of the list and concatenates > them together: > > ``` > [2, 4, 6, 8, 10, 2, 4, 6, 8, 10] > ``` > > It's equivalent to: > > ```python! > counts + counts > ``` ::: ### Repeating Actions with Loops ```python! odds = [1, 3, 5, 7] print(odds[0]) print(odds[1]) print(odds[2]) print(odds[3]) odds = [1, 3, 5] print(odds[0]) print(odds[1]) print(odds[2]) print(odds[3]) # Error! odds = [1, 3, 5, 7] for num in odds: print(num) odds = [1, 3, 5, 7] for banana in odds: print(banana) length = 0 names = ['Curie', 'Darwin', 'Turing'] for value in names: length = length + 1 print('There are', length, 'names in the list.') name = 'Rosalind' for name in ['Curie', 'Darwin', 'Turing']: print(name) print('after the loop, name is', name) print(len([1, 3, 5, 7])) ``` :::success :pencil: **From 1 to N** Python has a built-in function called `range` that generates a sequence of numbers. `range` can accept 1, 2, or 3 parameters. * If one parameter is given, `range` generates a sequence of that length, starting at zero and incrementing by 1. For example, `range(3)` produces the numbers `0, 1, 2`. * If two parameters are given, `range` starts at the first and ends just before the second, incrementing by one. For example, `range(2, 5)` produces `2, 3, 4`. * If `range` is given 3 parameters, it starts at the first one, ends just before the second one, and increments by the third one. For example, `range(3, 10, 2)` produces `3, 5, 7, 9`. Using `range`, write a loop that uses `range` to print the first 3 natural numbers: ```python! 1 2 3 ``` > :::spoiler :eyes: ***Solution*** > ```python! > for number in range(1, 4): > print(number) > ``` ::: :::success :pencil: **Understanding the loops** Given the following loop: ```python! word = 'oxygen' for char in word: print(char) ``` How many times is the body of the loop executed? * 3 times * 4 times * 5 times * 6 times > :::spoiler :eyes: ***Solution*** > > The body of the loop is executed 6 times. > ::: ### 6. Analyzing Data from Multiple Files ```python! import glob print(glob.glob('inflammation*.csv')) filenames = glob.glob('inflammation*.csv') print (filenames) filenames = sorted(glob.glob('inflammation*.csv')) print (filenames) import glob import numpy import matplotlib.pyplot filenames = sorted(glob.glob('inflammation*.csv')) filenames = filenames[0:3] for filename in filenames: print(filename) data = numpy.loadtxt(fname=filename, delimiter=',') fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('average') axes1.plot(numpy.mean(data, axis=0)) axes2.set_ylabel('max') axes2.plot(numpy.max(data, axis=0)) axes3.set_ylabel('min') axes3.plot(numpy.min(data, axis=0)) fig.tight_layout() matplotlib.pyplot.show() ``` :::success :pencil: **Plotting Differences** Plot the difference between the average inflammations reported in the first and second datasets (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly), i.e., the difference between the leftmost plots of the first two figures. > :::spoiler :eyes: ***Solution*** > ```python! > import glob > import numpy > import matplotlib.pyplot > > filenames = sorted(glob.glob('inflammation*.csv')) > > data0 = numpy.loadtxt(fname=filenames[0], delimiter=',') > data1 = numpy.loadtxt(fname=filenames[1], delimiter=',') > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > matplotlib.pyplot.ylabel('Difference in average') > matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0)) > > fig.tight_layout() > matplotlib.pyplot.show() > ``` ::: :::success :pencil: **Generate Composite Statistics** Use each of the files once to generate a dataset containing values averaged over all patients: ```python! filenames = glob.glob('inflammation*.csv') composite_data = numpy.zeros((60,40)) for filename in filenames: # sum each new file's data into composite_data as it's read # # and then divide the composite_data by number of samples composite_data = composite_data / len(filenames) ``` Then use pyplot to generate average, max, and min for all patients. > :::spoiler :eyes: ***Solution*** > ```python! > import glob > import numpy > import matplotlib.pyplot > > filenames = glob.glob('inflammation*.csv') > composite_data = numpy.zeros((60,40)) > > for filename in filenames: > data = numpy.loadtxt(fname = filename, delimiter=',') > composite_data = composite_data + data > > composite_data = composite_data / len(filenames) > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > axes1 = fig.add_subplot(1, 3, 1) > axes2 = fig.add_subplot(1, 3, 2) > axes3 = fig.add_subplot(1, 3, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(composite_data, axis=0)) > > axes2.set_ylabel('max') > axes2.plot(numpy.max(composite_data, axis=0)) > > axes3.set_ylabel('min') > axes3.plot(numpy.min(composite_data, axis=0)) > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` ::: ### 7. Making Choices ```python! num = 17 if num > 100: print('greater than') else: print('not greater than') print('done') num = 53 print('before conditional') if num > 100: print(num, 'is greater than 100') print('after conditional') num = -3 if num > 0: print(num, 'is positive') elif num == 0: print(num, 'is zero') else: print(num, 'is negative') if (1 > 0) and (-1 >= 0): print('both parts are true') else: print('and least one part is false') if (1 < 0) or (1 >= 0): print('at least one test is true') import numpy data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') max_inflammation_0 = numpy.max(data, axis=0)[0] max_inflammation_20 = numpy.max(data, axis=0)[20] if max_inflammation_0 == 0 and max_inflammation_20 == 20: print('Suspicious looking maxima!') elif numpy.sum(numpy.min(data, axis=0)) == 0: print('Minima add up to zero!') else: print('Seems OK!') ``` :::success :pencil: **How Many Paths?** Consider this code: ```python! if 4 > 5: print('A') elif 4 == 5: print('B') elif 4 < 5: print('C') ``` Which of the following would be printed if you were to run this code? Why did you pick this answer? 1. A 2. B 3. C 4. B and C > :::spoiler :eyes: ***Solution*** > C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true, > but `4 < 5` is true. ::: :::success :pencil: **What Is Truth?** `True` and `False` booleans are not the only values in Python that are true and false. In fact, *any* value can be used in an `if` or `elif`. After reading and running the code below, explain what the rule is for which values are considered true and which are considered false. ```python! if '': print('empty string is true') if 'word': print('word is true') if []: print('empty list is true') if [1, 2, 3]: print('non-empty list is true') if 0: print('zero is true') if 1: print('one is true') ``` :::  ### 8. Creating Functions ```python! def fahr_to_celsius(temp): return((temp - 32) * (5/9)) print(fahr_to_celsius(32)) print('freezing point of water:' , fahr_to_celsius(32), 'C') print('boiling point of water:' , fahr_to_celsius(212), 'C') def celsius_to_kelvin(temp_c): return temp_c + 273.15 print('freezing point of water in Kelvin:', celsius_to_kelvin(0.)) def fahr_to_kelvin(temp_f): temp_c = fahr_to_celsius(temp_f) temp_k = celsius_to_kelvin(temp_c) return temp_k print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0)) temp_kelvin = fahr_to_kelvin(212.0) print(temp_kelvin) def print_temperatures(): print('temperature in Fahrenheir was:', temp_fahr) print('temperature in Kelvin was:', temp_kelvin) temp_fahr = 212.0 temp_kelvin = fahr_to_kelvin(temp_fahr) print_temperatures() def visualize(filename): data = numpy.loadtxt(fname=filename, delimiter=',') fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) axes1 = fig.add_subplot(1, 3, 1) axes2 = fig.add_subplot(1, 3, 2) axes3 = fig.add_subplot(1, 3, 3) axes1.set_ylabel('average') axes1.plot(numpy.mean(data, axis=0)) axes2.set_ylabel('max') axes2.plot(numpy.max(data, axis=0)) axes3.set_ylabel('min') axes3.plot(numpy.min(data, axis=0)) fig.tight_layout() matplotlib.pyplot.show() def detect_problems(filename): data = numpy.loadtxt(fname=filename, delimiter=',') max_inflammation_0 = numpy.max(data, axis=0)[0] max_inflammation_20 = numpy.max(data, axis=0)[20] if max_inflammation_0 == 0 and max_inflammation_20 == 20: print('Suspicious looking maxima!') elif numpy.sum(numpy.min(data, axis=0)) == 0: print('Minima add up to zero!') else: print('Seems OK!') import numpy import glob import matplotlib.pyplot filenames = sorted(glob.glob('inflammation*.csv')) for filename in filenames[:3]: print(filename) visualize(filename) detect_problems(filename) def offset_mean(data, target_mean_value): return (data - numpy.mean(data)) + target_mean_value z = numpy.zeros((2,2)) print(offset_mean(z,3)) data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') print(offset_mean(data, 0)) print('original min, mean, and max are:', numpy.min(data), numpy.mean(data), numpy.max(data)) offset_data = offset_mean(data, 0) print('min, mean, and max of offset data are:', numpy.min(offset_data), numpy.mean(offset_data), numpy.max(offset_data)) print('standard deviation before and after', numpy.std(data), numpy.std(offset_data)) print('difference standard deviation before and after', numpy.std(data) - numpy.std(offset_data)) # offset_mean(data, target_mean_value): # return a new array containing the original data with its mean offset to match the desired value def offset_mean(data, target_mean_value): return (data - numpy.mean(data)) + target_mean_value def offset_mean(data, target_mean_value): """Return a new array containing the original data with its mean offset to match the desired value.""" return (data - numpy.mean(data)) + target_mean_value help(offset_mean) def offset_mean(data, target_mean_value): """Return a new array containing the original data with its mean offset to match the desired value. Examples -------- >>> offset_mean([1,2,3],0) array([-1., 0., 1.]) """ return (data - numpy.mean(data)) + target_mean_value help(offset_mean) data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') data = numpy.loadtxt('inflammation-01.csv', delimiter=',') data = numpy.loadtxt('inflammation-01.csv', ',') help(numpy.loadtxt) def offset_mean(data, target_mean_value=0.0): """Return a new array containing the original data with its mean offset to match the desired value. Examples -------- >>> offset_mean([1,2,3],0) array([-1., 0., 1.]) """ return (data - numpy.mean(data)) + target_mean_value test_data = numpy.zeros((2,2)) print(offset_mean(test_data, 3)) more_data = 5 + numpy.zeros((2,2)) print('data before mean offset:') print(more_data) print('offset data:') print(offset_mean(more_data)) def display(a=1, b=2, c=3): print('a:', a, 'b:', b, 'c:', c) print('no parameters:') display() print('one parameter:') display(55) print('two parameters:') display(55, 66) print('only setting the value of c') display(c=77) help(numpy.loadtxt) ``` :::success :pencil: **Combining Strings** "Adding" two strings produces their concatenation: `'a' + 'b'` is `'ab'`. Write a function called `fence` that takes two parameters called `original` and `wrapper` and returns a new string that has the wrapper character at the beginning and end of the original. A call to your function should look like this: ```python! print(fence('name', '*')) ``` ``` *name* ``` > :::spoiler :eyes: ***Solution*** > ```python! > def fence(original, wrapper): > return wrapper + original + wrapper > ``` ::: :::success :pencil: **Return versus print** Note that `return` and `print` are not interchangeable. `print` is a Python function that *prints* data to the screen. It enables us, *users*, see the data. `return` statement, on the other hand, makes data visible to the program. Let's have a look at the following function: ```python! def add(a, b): print(a + b) ``` **Question**: What will we see if we execute the following commands? ```python! A = add(7, 3) print(A) ``` > :::spoiler :eyes: ***Solution*** > Python will first execute the function `add` with `a = 7` and `b = 3`, > and, therefore, print `10`. However, because function `add` does not have a > line that starts with `return` (no `return` "statement"), it will, by default, return > nothing which, in Python world, is called `None`. Therefore, `A` will be assigned to `None` > and the last line (`print(A)`) will print `None`. As a result, we will see: > ``` > 10 > None > ``` ::: :::success :pencil: **Selecting Characters From Strings** If the variable `s` refers to a string, then `s[0]` is the string's first character and `s[-1]` is its last. Write a function called `outer` that returns a string made up of just the first and last characters of its input. A call to your function should look like this: ```python! print(outer('helium')) ``` ``` hm ``` > :::spoiler :eyes: ***Solution*** > ```python! > def outer(input_string): > return input_string[0] + input_string[-1] > ``` :::  ### 9. Errors and Exceptions ```python! # This code has and intentional error def favorite_ice_cream(): ice_creams = ['chocolate', 'vanilla', 'strawberry'] print(ice_creams[3]) favorite_ice_cream() def favorite_ice_cream(): ice_creams = ['chocolate', 'vanilla', 'strawberry'] print(ice_creams[2]) favorite_ice_cream() def some_function() msg = 'hello, world!' print(msg) return msg def some_function(): msg = 'hello, world!' print(msg) return msg def some_function(): msg = 'hello, world!' print(msg) return msg def some_function(): msg = 'hello, world!' print(msg) return msg print(a) for number in range(10): count = count + number print('The count is', count) count=0 for number in range(10): count = count + number print('The count is', count) count=0 for number in range(10): Count = Count + number print('The count is', Count) letters = ['a','b','c'] print('letter #1 is', letters[0]) print('letter #2 is', letters[1]) print('letter #3 is', letters[2]) print('letter #4 is', letters[3]) file_handle = open('myfile.txt', 'r') file_handle = open('myfile.txt', 'w') file_handle.read() ``` :::success :pencil: **Reading Error Messages** Read the Python code and the resulting traceback below, and answer the following questions: 1. How many levels does the traceback have? 2. What is the function name where the error occurred? 3. On which line number in this function did the error occur? 4. What is the type of error? 5. What is the error message? ```python! # This code has an intentional error. Do not type it directly; # use it for reference to understand the error message below. def print_message(day): messages = { 'monday': 'Hello, world!', 'tuesday': 'Today is Tuesday!', 'wednesday': 'It is the middle of the week.', 'thursday': 'Today is Donnerstag in German!', 'friday': 'Last day of the week!', 'saturday': 'Hooray for the weekend!', 'sunday': 'Aw, the weekend is almost over.' } print(messages[day]) def print_friday_message(): print_message('Friday') print_friday_message() ``` ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-1-4be1945adbe2> in <module>() 14 print_message('Friday') 15 ---> 16 print_friday_message() <ipython-input-1-4be1945adbe2> in print_friday_message() 12 13 def print_friday_message(): ---> 14 print_message('Friday') 15 16 print_friday_message() <ipython-input-1-4be1945adbe2> in print_message(day) 9 'sunday': 'Aw, the weekend is almost over.' 10 } ---> 11 print(messages[day]) 12 13 def print_friday_message(): KeyError: 'Friday' ``` > :::spoiler :eyes: ***Solution*** > 1. 3 levels > 2. `print_message` > 3. 11 > 4. `KeyError` > 5. There isn't really a message; you're supposed > to infer that `Friday` is not a key in `messages`. :::  ### 10. Defensive Programming ```python! numbers = [1.5, 2.3, 0.7, -0.001, 4.4] total = 0.0 for num in numbers: assert num > 0.0, 'data should only positive values' total = total + num print('total is', total) def normalize_rectangle(rect): """Normalizes a rectangle so that it is at the origin and 1.0 units long on its longest axis. Input should be of the format (x0, y0, x1, y1). (x0, y0) and (x1, y1) define the lower left and upper right corners of the rectangle, respectively.""" assert len(rect) == 4, 'Rectangles must contain 4 coordinates' x0, y0, x1, y1 = rect assert x0 < x1, 'Invalid X coordinates' assert y0 < y1, 'Invalid Y coordinates' dx = x1 - x0 dy = y1 - y0 if dx > dy: scaled = float(dx) / dy upper_x, upper_y = 1.0, scaled else: scaled = float(dx) / dy upper_x, upper_y = scaled, 1.0 assert 0 < upper_x <= 1.0, 'Calculated upper X coordinate invalid' assert 0 < upper_y <= 1.0, 'Calculated upper Y coordinate invalid' return (0, 0, upper_x, upper_y) print(normalize_rectangle( (0.0, 1.0, 2.0) )) # missing the fourth coordinate print(normalize_rectangle( (4.0, 2.0, 1.0, 5.0) )) # X axis inverted print(normalize_rectangle( (0.0, 0.0, 1.0, 5.0) )) ``` <!--- :::success :pencil: **Pre- and Post-Conditions** Suppose you are writing a function called `average` that calculates the average of the numbers in a list. What pre-conditions and post-conditions would you write for it? Compare your answer to your neighbor's: can you think of a function that will pass your tests but not his/hers or vice versa? > :::spoiler :eyes: ***Solution*** > ```python! > # a possible pre-condition: > assert len(input_list) > 0, 'List length must be non-zero' > # a possible post-condition: > assert numpy.min(input_list) <= average <= numpy.max(input_list), > 'Average should be between min and max of input values (inclusive)' > ``` ::: <!--- :::success :pencil: **Testing Assertions** Given a sequence of a number of cars, the function `get_total_cars` returns the total number of cars. ```python! get_total_cars([1, 2, 3, 4]) ``` ``` 10 ``` ```python! get_total_cars(['a', 'b', 'c']) ``` ``` ValueError: invalid literal for int() with base 10: 'a' ``` Explain in words what the assertions in this function check, and for each one, give an example of input that will make that assertion fail. ```python! def get_total(values): assert len(values) > 0 for element in values: assert int(element) values = [int(element) for element in values] total = sum(values) assert total > 0 return total ``` > :::spoiler :eyes: ***Solution*** > * The first assertion checks that the input sequence `values` is not empty. > An empty sequence such as `[]` will make it fail. > * The second assertion checks that each value in the list can be turned into an integer. > Input such as `[1, 2,'c', 3]` will make it fail. > * The third assertion checks that the total of the list is greater than 0. > Input such as `[-10, 2, 3]` will make it fail. :::