SWC, NWO-I 2024, Day 2

# NWO-I Software Carpentry, 17 October 2024, Day 2 :::info :information_source: On this page you will find notes for the second day of the NWO-I Software Carpentry workshop organized on October 17. ::: ## Code of Conduct Everyone who participates in Carpentries activities is required to conform to the [Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html). This document also outlines how to report an incident if needed. ## :timer_clock: Schedule 17 October 2024 | | **Programming with Python**| |------|------| | 09:30 | Python fundamentals & analyzing tabular data | | 10:30 | *Morning break* | | 10:40 | Visualizing tabular data | | 12:30 | *Lunch break* | | 13:15 | Lists & loops | | 15:30 | *Afternoon break* | | 15:45 | Conditionals, functions & best practices | | 17:30 | *END* | ## Programming with Python ### :link: Links * Setup page: https://swcarpentry.github.io/python-novice-inflammation/index.html * Lesson material: https://swcarpentry.github.io/python-novice-inflammation/ * Reference page: https://swcarpentry.github.io/python-novice-inflammation/reference.html ### 1. Python Fundamentals :::success :pencil: **1.1 Check Your Understanding** What values do the variables `mass` and `age` have after each of the following statements? Test your answer by executing the lines. ```python! mass = 47.5 age = 122 mass = mass * 2.0 age = age - 20 ``` > :::spoiler :eyes: ***Solution*** > ``` > `mass` holds a value of 47.5, `age` does not exist > `mass` still holds a value of 47.5, `age` holds a value of 122 > `mass` now has a value of 95.0, `age`'s value is still 122 > `mass` still has a value of 95.0, `age` now holds 102 > ``` ::: :::success :pencil: **1.2 Sorting Out References** Python allows you to assign multiple values to multiple variables in one line by separating the variables and values with commas. What does the following program print out? ```python! first, second = 'Grace', 'Hopper' third, fourth = second, first print(third, fourth) ``` > :::spoiler :eyes: ***Solution*** > ``` > Hopper Grace > ``` ::: :::success :pencil: **1.3 Seeing Data Types** What are the data types of the following variables? ```python! planet = 'Earth' apples = 5 distance = 10.5 ``` > :::spoiler :eyes: ***Solution*** > ```python! > print(type(planet)) > print(type(apples)) > print(type(distance)) > ``` > > ``` > <class 'str'> > <class 'int'> > <class 'float'> > ``` ::: ### 2. Analyzing Patient Data :::success :pencil: **2.1 Slicing Strings** A section of an array is called a *slice*. We can take slices of character strings as well: ```python! element = 'oxygen' print('first three characters:', element[0:3]) print('last three characters:', element[3:6]) ``` ``` first three characters: oxy last three characters: gen ``` What is the value of `element[:4]`? What about `element[4:]`? Or `element[:]`? > :::spoiler :eyes: ***Solution*** > ``` > oxyg > en > oxygen > ``` What is `element[-1]`? What is `element[-2]`? > :::spoiler :eyes: ***Solution*** > ``` > n > e > ``` Given those answers, explain what `element[1:-1]` does. > :::spoiler :eyes: ***Solution*** > Creates a substring from index 1 up to (not including) the final index, > effectively removing the first and last letters from 'oxygen' How can we rewrite the slice for getting the last three characters of `element`, so that it works even if we assign a different string to `element`? Test your solution with the following strings: `carpentry`, `clone`, `hi`. > :::spoiler :eyes: ***Solution*** > ```python! > element = 'oxygen' > print('last three characters:', element[-3:]) > element = 'carpentry' > print('last three characters:', element[-3:]) > element = 'clone' > print('last three characters:', element[-3:]) > element = 'hi' > print('last three characters:', element[-3:]) > ``` > ``` > last three characters: gen > last three characters: try > last three characters: one > last three characters: hi > ``` ::: :::success :pencil: **2.2 Thin Slices** The expression `element[3:3]` produces an *empty string*, i.e., a string that contains no characters. If `data` holds our array of patient data, what does `data[3:3, 4:4]` produce? What about `data[3:3, :]`? > :::spoiler :eyes: ***Solution*** > ``` > array([], shape=(0, 0), dtype=float64) > array([], shape=(0, 40), dtype=float64) > ``` ::: :::success :pencil: **2.3 Stacking Arrays** Arrays can be concatenated and stacked on top of one another, using NumPy's `vstack` and `hstack` functions for vertical and horizontal stacking, respectively. ```python! import numpy A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]]) print('A = ') print(A) B = numpy.hstack([A, A]) print('B = ') print(B) C = numpy.vstack([A, A]) print('C = ') print(C) ``` ``` A = [[1 2 3] [4 5 6] [7 8 9]] B = [[1 2 3 1 2 3] [4 5 6 4 5 6] [7 8 9 7 8 9]] C = [[1 2 3] [4 5 6] [7 8 9] [1 2 3] [4 5 6] [7 8 9]] ``` Write some additional code that slices the first and last columns of `A`, and stacks them into a 3x2 array. Make sure to `print` the results to verify your solution. > :::spoiler :eyes: ***Solution*** > > A 'gotcha' with array indexing is that singleton dimensions > are dropped by default. That means `A[:, 0]` is a one dimensional > array, which won't stack as desired. To preserve singleton dimensions, > the index itself can be a slice or array. For example, `A[:, :1]` returns > a two dimensional array with one singleton dimension (i.e. a column > vector). > > ```python! > D = numpy.hstack((A[:, :1], A[:, -1:])) > print('D = ') > print(D) > ``` > > ``` > D = > [[1 3] > [4 6] > [7 9]] > ``` > :::spoiler :eyes: ***Solution*** > > An alternative way to achieve the same result is to use Numpy's > delete function to remove the second column of A. > > ```python! > D = numpy.delete(A, 1, 1) > print('D = ') > print(D) > ``` > > ``` > D = > [[1 3] > [4 6] > [7 9]] > ``` ::: :::success :pencil: **2.4 Change In Inflammation** The patient data is _longitudinal_ in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let's find out how to calculate changes in the data contained in an array with NumPy. The `numpy.diff()` function takes an array and returns the differences between two successive values. Let's use it to examine the changes each day across the first week of patient 3 from our inflammation dataset. ```python! patient3_week1 = data[3, :7] print(patient3_week1) ``` ``` [0. 0. 2. 0. 4. 2. 2.] ``` Calling `numpy.diff(patient3_week1)` would do the following calculations ```python! [ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ] ``` and return the 6 difference values in a new array. ```python! numpy.diff(patient3_week1) ``` ``` array([ 0., 2., -2., 4., -2., 0.]) ``` Note that the array of differences is shorter by one element (length 6). When calling `numpy.diff` with a multi-dimensional array, an `axis` argument may be passed to the function to specify which axis to process. When applying `numpy.diff` to our 2D inflammation array `data`, which axis would we specify? > :::spoiler :eyes: ***Solution*** > Since the row axis (0) is patients, it does not make sense to get the > difference between two arbitrary patients. The column axis (1) is in > days, so the difference is the change in inflammation -- a meaningful > concept. > > ```python! > numpy.diff(data, axis=1) > ``` If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns), what would the shape of the array be after you run the `diff()` function and why? > :::spoiler :eyes: ***Solution*** > The shape will be `(60, 39)` because there is one fewer difference between > columns than there are columns in the data. How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease? > :::spoiler :eyes: ***Solution*** > By using the `numpy.max()` function after you apply the `numpy.diff()` > function, you will get the largest difference between days. > > ```python! > numpy.max(numpy.diff(data, axis=1), axis=1) > ``` > > ```python! > array([ 7., 12., 11., 10., 11., 13., 10., 8., 10., 10., 7., > 7., 13., 7., 10., 10., 8., 10., 9., 10., 13., 7., > 12., 9., 12., 11., 10., 10., 7., 10., 11., 10., 8., > 11., 12., 10., 9., 10., 13., 10., 7., 7., 10., 13., > 12., 8., 8., 10., 10., 9., 8., 13., 10., 7., 10., > 8., 12., 10., 7., 12.]) > ``` > > If inflammation values *decrease* along an axis, then the difference from > one element to the next will be negative. If > you are interested in the **magnitude** of the change and not the > direction, the `numpy.absolute()` function will provide that. > > Notice the difference if you get the largest _absolute_ difference > between readings. > > ```python! > numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1) > ``` > > ```python! > array([ 12., 14., 11., 13., 11., 13., 10., 12., 10., 10., 10., > 12., 13., 10., 11., 10., 12., 13., 9., 10., 13., 9., > 12., 9., 12., 11., 10., 13., 9., 13., 11., 11., 8., > 11., 12., 13., 9., 10., 13., 11., 11., 13., 11., 13., > 13., 10., 9., 10., 10., 9., 9., 13., 10., 9., 10., > 11., 13., 10., 10., 12.]) > ``` > ::: ### 3. Visualizing Tabular Data :::success :pencil: **3.1 Plot Scaling** Why do all of our plots stop just short of the upper end of our graph? > :::spoiler :eyes: ***Solution*** > Because matplotlib normally sets x and y axes limits to the min and max of our data > (depending on data range) > > If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes', > for example: > > ```python! > axes3.set_ylim(0,6) > ``` Update your plotting code to automatically set a more appropriate scale. (Hint: you can make use of the `max` and `min` methods to help.) > :::spoiler :eyes: ***Solution*** > ```python! > # One method > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0)) > axes3.set_ylim(0,6) > ``` > :::spoiler :eyes: ***Solution*** > ```python! > # A more automated approach > min_data = numpy.min(data, axis=0) > axes3.set_ylabel('min') > axes3.plot(min_data) > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1) > ``` ::: :::success :pencil: **3.2 Drawing Straight Lines** In the center and right subplots above, we expect all lines to look like step functions because non-integer value are not realistic for the minimum and maximum values. However, you can see that the lines are not always vertical or horizontal, and in particular the step function in the subplot on the right looks slanted. Why is this? > :::spoiler :eyes: ***Solution*** > Because matplotlib interpolates (draws a straight line) between the points. > One way to do avoid this is to use the Matplotlib `drawstyle` option: > > ```python! > import numpy > import matplotlib.pyplot > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > axes1 = fig.add_subplot(1, 3, 1) > axes2 = fig.add_subplot(1, 3, 2) > axes3 = fig.add_subplot(1, 3, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid') > > axes2.set_ylabel('max') > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid') > > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid') > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` > ![Three line graphs, with step lines connecting the points, showing the daily average, maximum and minimum inflammation over a 40-day period.](../fig/inflammation-01-line-styles.svg) ::: :::success :pencil: **3.3 Make Your Own Plot** Create a plot showing the standard deviation (`numpy.std`) of the inflammation data for each day across all patients. > :::spoiler :eyes: ***Solution*** > ```python! > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0)) > matplotlib.pyplot.show() > ``` ::: :::success :pencil: **3.4 Moving Plots Around** Modify the program to display the three plots on top of one another instead of side by side. > :::spoiler :eyes: ***Solution*** > ```python! > import numpy > import matplotlib.pyplot > > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',') > > # change figsize (swap width and height) > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0)) > > # change add_subplot (swap first two parameters) > axes1 = fig.add_subplot(3, 1, 1) > axes2 = fig.add_subplot(3, 1, 2) > axes3 = fig.add_subplot(3, 1, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(data, axis=0)) > > axes2.set_ylabel('max') > axes2.plot(numpy.max(data, axis=0)) > > axes3.set_ylabel('min') > axes3.plot(numpy.min(data, axis=0)) > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` ::: ### 4. Storing Multiple Values in Lists :::success :pencil: **4.1 Slicing From the End** Use slicing to access only the last four characters of a string or entries of a list. ```python! string_for_slicing = 'Observation date: 02-Feb-2013' list_for_slicing = [['fluorine', 'F'], ['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` ``` '2013' [['chlorine', 'Cl'], ['bromine', 'Br'], ['iodine', 'I'], ['astatine', 'At']] ``` Would your solution work regardless of whether you knew beforehand the length of the string or list (e.g. if you wanted to apply the solution to a set of lists of different lengths)? If not, try to change your approach to make it more robust. Hint: Remember that indices can be negative as well as positive > :::spoiler :eyes: ***Solution*** > Use negative indices to count elements from the end of a container (such as list or string): > > ```python! > string_for_slicing[-4:] > list_for_slicing[-4:] > ``` ::: :::success :pencil: **4.2 Overloading** `+` usually means addition, but when used on strings or lists, it means "concatenate". Given that, what do you think the multiplication operator `*` does on lists? In particular, what will be the output of the following code? ```python! counts = [2, 4, 6, 8, 10] repeats = counts * 2 print(repeats) ``` 1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]` 2. `[4, 8, 12, 16, 20]` 3. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]` 4. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]` The technical term for this is *operator overloading*: a single operator, like `+` or `*`, can do different things depending on what it's applied to. > :::spoiler :eyes: ***Solution*** > > The multiplication operator `*` used on a list replicates elements of the list and concatenates > them together: > > ``` > [2, 4, 6, 8, 10, 2, 4, 6, 8, 10] > ``` > > It's equivalent to: > > ```python! > counts + counts > ``` ::: ### 5. Repeating Actions with Loops :::success :pencil: **5.1 From 1 to N** Python has a built-in function called `range` that generates a sequence of numbers. `range` can accept 1, 2, or 3 parameters. * If one parameter is given, `range` generates a sequence of that length, starting at zero and incrementing by 1. For example, `range(3)` produces the numbers `0, 1, 2`. * If two parameters are given, `range` starts at the first and ends just before the second, incrementing by one. For example, `range(2, 5)` produces `2, 3, 4`. * If `range` is given 3 parameters, it starts at the first one, ends just before the second one, and increments by the third one. For example, `range(3, 10, 2)` produces `3, 5, 7, 9`. Using `range`, write a loop that uses `range` to print the first 3 natural numbers: ```python! 1 2 3 ``` > :::spoiler :eyes: ***Solution*** > ```python! > for number in range(1, 4): > print(number) > ``` ::: :::success :pencil: **5.2 Understanding the loops** Given the following loop: ```python! word = 'oxygen' for char in word: print(char) ``` How many times is the body of the loop executed? * 3 times * 4 times * 5 times * 6 times > :::spoiler :eyes: ***Solution*** > > The body of the loop is executed 6 times. > ::: :::success :pencil: **5.3 Computing Powers With Loops** Exponentiation is built into Python: ```python! print(5 ** 3) ``` ``` 125 ``` Write a loop that calculates the same result as `5 ** 3` using multiplication (and without exponentiation). > :::spoiler :eyes: ***Solution*** > ```python! > result = 1 > for number in range(0, 3): > result = result * 5 > print(result) > ``` ::: :::success :pencil: **5.4 Summing a list** Write a loop that calculates the sum of elements in a list by adding each element and printing the final value, so `[124, 402, 36]` prints 562 > :::spoiler :eyes: ***Solution*** > ```python! > numbers = [124, 402, 36] > summed = 0 > for num in numbers: > summed = summed + num > print(summed) > ``` ::: :::success :pencil: **5.5 Computing the Value of a Polynomial** The built-in function `enumerate` takes a sequence (e.g. a [list]({{ page.root }}/04-lists/)) and generates a new sequence of the same length. Each element of the new sequence is a pair composed of the index (0, 1, 2,...) and the value from the original sequence: ```python! for idx, val in enumerate(a_list): # Do something using idx and val ``` The code above loops through `a_list`, assigning the index to `idx` and the value to `val`. Suppose you have encoded a polynomial as a list of coefficients in the following way: the first element is the constant term, the second element is the coefficient of the linear term, the third is the coefficient of the quadratic term, etc. ```python! x = 5 coefs = [2, 4, 3] y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2 print(y) ``` ``` 97 ``` Write a loop using `enumerate(coefs)` which computes the value `y` of any polynomial, given `x` and `coefs`. > :::spoiler :eyes: ***Solution*** > ```python! > y = 0 > for idx, coef in enumerate(coefs): > y = y + coef * x**idx > ``` ::: ### 6. Analyzing Data from Multiple Files :::success :pencil: **6.1 Plotting Differences** Plot the difference between the average inflammations reported in the first and second datasets (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly), i.e., the difference between the leftmost plots of the first two figures. > :::spoiler :eyes: ***Solution*** > ```python! > import glob > import numpy > import matplotlib.pyplot > > filenames = sorted(glob.glob('inflammation*.csv')) > > data0 = numpy.loadtxt(fname=filenames[0], delimiter=',') > data1 = numpy.loadtxt(fname=filenames[1], delimiter=',') > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > matplotlib.pyplot.ylabel('Difference in average') > matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0)) > > fig.tight_layout() > matplotlib.pyplot.show() > ``` ::: :::success :pencil: **6.2 Generate Composite Statistics** Use each of the files once to generate a dataset containing values averaged over all patients: ```python! filenames = glob.glob('inflammation*.csv') composite_data = numpy.zeros((60,40)) for filename in filenames: # sum each new file's data into composite_data as it's read # # and then divide the composite_data by number of samples composite_data = composite_data / len(filenames) ``` Then use pyplot to generate average, max, and min for all patients. > :::spoiler :eyes: ***Solution*** > ```python! > import glob > import numpy > import matplotlib.pyplot > > filenames = glob.glob('inflammation*.csv') > composite_data = numpy.zeros((60,40)) > > for filename in filenames: > data = numpy.loadtxt(fname = filename, delimiter=',') > composite_data = composite_data + data > > composite_data = composite_data / len(filenames) > > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0)) > > axes1 = fig.add_subplot(1, 3, 1) > axes2 = fig.add_subplot(1, 3, 2) > axes3 = fig.add_subplot(1, 3, 3) > > axes1.set_ylabel('average') > axes1.plot(numpy.mean(composite_data, axis=0)) > > axes2.set_ylabel('max') > axes2.plot(numpy.max(composite_data, axis=0)) > > axes3.set_ylabel('min') > axes3.plot(numpy.min(composite_data, axis=0)) > > fig.tight_layout() > > matplotlib.pyplot.show() > ``` ::: ### 7. Making Choices :::success :pencil: **7.1 How Many Paths?** Consider this code: ```python! if 4 > 5: print('A') elif 4 == 5: print('B') elif 4 < 5: print('C') ``` Which of the following would be printed if you were to run this code? Why did you pick this answer? 1. A 2. B 3. C 4. B and C > :::spoiler :eyes: ***Solution*** > C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true, > but `4 < 5` is true. ::: :::success :pencil: **7.2 What Is Truth?** `True` and `False` booleans are not the only values in Python that are true and false. In fact, *any* value can be used in an `if` or `elif`. After reading and running the code below, explain what the rule is for which values are considered true and which are considered false. ```python! if '': print('empty string is true') if 'word': print('word is true') if []: print('empty list is true') if [1, 2, 3]: print('non-empty list is true') if 0: print('zero is true') if 1: print('one is true') ``` ::: :::success :pencil: **7.3 That's Not Not What I Meant** Sometimes it is useful to check whether some condition is not true. The Boolean operator `not` can do this explicitly. After reading and running the code below, write some `if` statements that use `not` to test the rule that you formulated in the previous challenge. ```python! if not '': print('empty string is not true') if not 'word': print('word is not true') if not not True: print('not not True is true') ``` ::: :::success :pencil: **7.4 Close Enough** Write some conditions that print `True` if the variable `a` is within 10% of the variable `b` and `False` otherwise. Compare your implementation with your partner's: do you get the same answer for all possible pairs of numbers? > :::spoiler :eyes: ***Solution*** > There is a [built-in function `abs`][abs-function] that returns the absolute value of > a number: > ```python! > print(abs(-12)) > ``` > ``` > 12 > ``` > :::spoiler :eyes: ***Solution*** > ```python! > a = 5 > b = 5.1 > > if abs(a - b) <= 0.1 * abs(b): > print('True') > else: > print('False') > ``` > :::spoiler :eyes: ***Solution*** > ```python! > print(abs(a - b) <= 0.1 * abs(b)) > ``` > > This works because the Booleans `True` and `False` > have string representations which can be printed. ::: :::success :pencil: **7.5 In-Place Operators** Python (and most other languages in the C family) provides [in-place operators]({{ page.root }}/reference.html#in-place-operators) that work like this: ```python! x = 1 # original value x += 1 # add one to x, assigning result back to x x *= 3 # multiply x by 3 print(x) ``` ``` 6 ``` Write some code that sums the positive and negative numbers in a list separately, using in-place operators. Do you think the result is more or less readable than writing the same without in-place operators? > :::spoiler :eyes: ***Solution*** > ```python! > positive_sum = 0 > negative_sum = 0 > test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8] > for num in test_list: > if num > 0: > positive_sum += num > elif num == 0: > pass > else: > negative_sum += num > print(positive_sum, negative_sum) > ``` > Here `pass` means "don't do anything". > :::spoiler :eyes: ***Solution*** ::: :::success :pencil: **7.6 Sorting a List Into Buckets** In our `data` folder, large data sets are stored in files whose names start with "inflammation-" and small data sets -- in files whose names start with "small-". We also have some other files that we do not care about at this point. We'd like to break all these files into three lists called `large_files`, `small_files`, and `other_files`, respectively. Add code to the template below to do this. Note that the string method [`startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith) returns `True` if and only if the string it is called on starts with the string passed as an argument, that is: ```python! 'String'.startswith('Str') ``` ``` True ``` But ```python! 'String'.startswith('str') ``` ``` False ``` Use the following Python code as your starting point: ```python! filenames = ['inflammation-01.csv', 'myscript.py', 'inflammation-02.csv', 'small-01.csv', 'small-02.csv'] large_files = [] small_files = [] other_files = [] ``` Your solution should: 1. loop over the names of the files 2. figure out which group each filename belongs in 3. append the filename to that list In the end the three lists should be: ```python! large_files = ['inflammation-01.csv', 'inflammation-02.csv'] small_files = ['small-01.csv', 'small-02.csv'] other_files = ['myscript.py'] ``` > :::spoiler :eyes: ***Solution*** > ```python! > for filename in filenames: > if filename.startswith('inflammation-'): > large_files.append(filename) > elif filename.startswith('small-'): > small_files.append(filename) > else: > other_files.append(filename) > > print('large_files:', large_files) > print('small_files:', small_files) > print('other_files:', other_files) > ``` ::: ### 8. Creating Functions :::success :pencil: **8.1 Combining Strings** "Adding" two strings produces their concatenation: `'a' + 'b'` is `'ab'`. Write a function called `fence` that takes two parameters called `original` and `wrapper` and returns a new string that has the wrapper character at the beginning and end of the original. A call to your function should look like this: ```python! print(fence('name', '*')) ``` ``` *name* ``` > :::spoiler :eyes: ***Solution*** > ```python! > def fence(original, wrapper): > return wrapper + original + wrapper > ``` ::: :::success :pencil: **8.2 Return versus print** Note that `return` and `print` are not interchangeable. `print` is a Python function that *prints* data to the screen. It enables us, *users*, see the data. `return` statement, on the other hand, makes data visible to the program. Let's have a look at the following function: ```python! def add(a, b): print(a + b) ``` **Question**: What will we see if we execute the following commands? ```python! A = add(7, 3) print(A) ``` > :::spoiler :eyes: ***Solution*** > Python will first execute the function `add` with `a = 7` and `b = 3`, > and, therefore, print `10`. However, because function `add` does not have a > line that starts with `return` (no `return` "statement"), it will, by default, return > nothing which, in Python world, is called `None`. Therefore, `A` will be assigned to `None` > and the last line (`print(A)`) will print `None`. As a result, we will see: > ``` > 10 > None > ``` ::: :::success :pencil: **8.3 Selecting Characters From Strings** If the variable `s` refers to a string, then `s[0]` is the string's first character and `s[-1]` is its last. Write a function called `outer` that returns a string made up of just the first and last characters of its input. A call to your function should look like this: ```python! print(outer('helium')) ``` ``` hm ``` > :::spoiler :eyes: ***Solution*** > ```python! > def outer(input_string): > return input_string[0] + input_string[-1] > ``` ::: :::success :pencil: **8.4 Rescaling an Array** Write a function `rescale` that takes an array as input and returns a corresponding array of values scaled to lie in the range 0.0 to 1.0. (Hint: If `L` and `H` are the lowest and highest values in the original array, then the replacement for a value `v` should be `(v-L) / (H-L)`.) > :::spoiler :eyes: ***Solution*** > ```python! > def rescale(input_array): > L = numpy.min(input_array) > H = numpy.max(input_array) > output_array = (input_array - L) / (H - L) > return output_array > ``` ::: :::success :pencil: **8.5 Variables Inside and Outside Functions** What does the following piece of code display when run --- and why? ```python! f = 0 k = 0 def f2k(f): k = ((f - 32) * (5.0 / 9.0)) + 273.15 return k print(f2k(8)) print(f2k(41)) print(f2k(32)) print(k) ``` > :::spoiler :eyes: ***Solution*** > > ``` > 259.81666666666666 > 278.15 > 273.15 > 0 > ``` > `k` is 0 because the `k` inside the function `f2k` doesn't know > about the `k` defined outside the function. When the `f2k` function is called, > it creates a [local variable]({{ page.root }}/reference.html#local-variable) > `k`. The function does not return any values > and does not alter `k` outside of its local copy. > Therefore the original value of `k` remains unchanged. > Beware that a local `k` is created because `f2k` internal statements > *affect* a new value to it. If `k` was only `read`, it would simply retrieve the > global `k` value. ::: :::success :pencil: **8.6 Mixing Default and Non-Default Parameters** Given the following code: ```python! def numbers(one, two=2, three, four=4): n = str(one) + str(two) + str(three) + str(four) return n print(numbers(1, three=3)) ``` what do you expect will be printed? What is actually printed? What rule do you think Python is following? 1. `1234` 2. `one2three4` 3. `1239` 4. `SyntaxError` Given that, what does the following piece of code display when run? ```python! def func(a, b=3, c=6): print('a: ', a, 'b: ', b, 'c:', c) func(-1, 2) ``` 1. `a: b: 3 c: 6` 2. `a: -1 b: 3 c: 6` 3. `a: -1 b: 2 c: 6` 4. `a: b: -1 c: 2` > :::spoiler :eyes: ***Solution*** > Attempting to define the `numbers` function results in `4. SyntaxError`. > The defined parameters `two` and `four` are given default values. Because > `one` and `three` are not given default values, they are required to be > included as arguments when the function is called and must be placed > before any parameters that have default values in the function definition. > > The given call to `func` displays `a: -1 b: 2 c: 6`. -1 is assigned to > the first parameter `a`, 2 is assigned to the next parameter `b`, and `c` is > not passed a value, so it uses its default value 6. ::: ### 9. Errors and Exceptions :::success :pencil: **9.1 Reading Error Messages** Read the Python code and the resulting traceback below, and answer the following questions: 1. How many levels does the traceback have? 2. What is the function name where the error occurred? 3. On which line number in this function did the error occur? 4. What is the type of error? 5. What is the error message? ```python! # This code has an intentional error. Do not type it directly; # use it for reference to understand the error message below. def print_message(day): messages = { 'monday': 'Hello, world!', 'tuesday': 'Today is Tuesday!', 'wednesday': 'It is the middle of the week.', 'thursday': 'Today is Donnerstag in German!', 'friday': 'Last day of the week!', 'saturday': 'Hooray for the weekend!', 'sunday': 'Aw, the weekend is almost over.' } print(messages[day]) def print_friday_message(): print_message('Friday') print_friday_message() ``` ``` --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-1-4be1945adbe2> in <module>() 14 print_message('Friday') 15 ---> 16 print_friday_message() <ipython-input-1-4be1945adbe2> in print_friday_message() 12 13 def print_friday_message(): ---> 14 print_message('Friday') 15 16 print_friday_message() <ipython-input-1-4be1945adbe2> in print_message(day) 9 'sunday': 'Aw, the weekend is almost over.' 10 } ---> 11 print(messages[day]) 12 13 def print_friday_message(): KeyError: 'Friday' ``` > :::spoiler :eyes: ***Solution*** > 1. 3 levels > 2. `print_message` > 3. 11 > 4. `KeyError` > 5. There isn't really a message; you're supposed > to infer that `Friday` is not a key in `messages`. ::: :::success :pencil: **9.2 Identifying Syntax Errors** 1. Read the code below, and (without running it) try to identify what the errors are. 2. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`? 3. Fix the error. 4. Repeat steps 2 and 3, until you have fixed all the errors. ```python! def another_function print('Syntax errors are annoying.') print('But at least Python tells us about them!') print('So they are usually not too hard to fix.') ``` > :::spoiler :eyes: ***Solution*** > `SyntaxError` for missing `():` at end of first line, > :::spoiler :eyes: ***Solution*** > > ```python! > def another_function(): > print('Syntax errors are annoying.') > print('But at least Python tells us about them!') > print('So they are usually not too hard to fix.') > ``` ::: :::success :pencil: **9.3 Identifying Variable Name Errors** 1. Read the code below, and (without running it) try to identify what the errors are. 2. Run the code, and read the error message. What type of `NameError` do you think this is? In other words, is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not? 3. Fix the error. 4. Repeat steps 2 and 3, until you have fixed all the errors. ```python! for number in range(10): # use a if the number is a multiple of 3, otherwise use b if (Number % 3) == 0: message = message + a else: message = message + 'b' print(message) ``` > :::spoiler :eyes: ***Solution*** > 3 `NameError`s for `number` being misspelled, for `message` not defined, > and for `a` not being in quotes. > > Fixed version: > > ```python! > message = '' > for number in range(10): > # use a if the number is a multiple of 3, otherwise use b > if (number % 3) == 0: > message = message + 'a' > else: > message = message + 'b' > print(message) > ``` ::: :::success :pencil: **9.4 Identifying Index Errors** 1. Read the code below, and (without running it) try to identify what the errors are. 2. Run the code, and read the error message. What type of error is it? 3. Fix the error. ```python! seasons = ['Spring', 'Summer', 'Fall', 'Winter'] print('My favorite season is ', seasons[4]) ``` > :::spoiler :eyes: ***Solution*** > `IndexError`; the last entry is `seasons[3]`, so `seasons[4]` doesn't make sense. > A fixed version is: > > ```python! > seasons = ['Spring', 'Summer', 'Fall', 'Winter'] > print('My favorite season is ', seasons[-1]) > ``` ::: ### 10. Defensive Programming :::success :pencil: **10.1 Pre- and Post-Conditions** Suppose you are writing a function called `average` that calculates the average of the numbers in a list. What pre-conditions and post-conditions would you write for it? Compare your answer to your neighbor's: can you think of a function that will pass your tests but not his/hers or vice versa? > :::spoiler :eyes: ***Solution*** > ```python! > # a possible pre-condition: > assert len(input_list) > 0, 'List length must be non-zero' > # a possible post-condition: > assert numpy.min(input_list) <= average <= numpy.max(input_list), > 'Average should be between min and max of input values (inclusive)' > ``` ::: :::success :pencil: **10.2 Testing Assertions** Given a sequence of a number of cars, the function `get_total_cars` returns the total number of cars. ```python! get_total_cars([1, 2, 3, 4]) ``` ``` 10 ``` ```python! get_total_cars(['a', 'b', 'c']) ``` ``` ValueError: invalid literal for int() with base 10: 'a' ``` Explain in words what the assertions in this function check, and for each one, give an example of input that will make that assertion fail. ```python! def get_total(values): assert len(values) > 0 for element in values: assert int(element) values = [int(element) for element in values] total = sum(values) assert total > 0 return total ``` > :::spoiler :eyes: ***Solution*** > * The first assertion checks that the input sequence `values` is not empty. > An empty sequence such as `[]` will make it fail. > * The second assertion checks that each value in the list can be turned into an integer. > Input such as `[1, 2,'c', 3]` will make it fail. > * The third assertion checks that the total of the list is greater than 0. > Input such as `[-10, 2, 3]` will make it fail. ::: ### 11. Debugging :::success :pencil: **11.1 Debug With a Neighbor** Take a function that you have written today, and introduce a tricky bug. Your function should still run, but will give the wrong output. Switch seats with your neighbor and attempt to debug the bug that they introduced into their function. Which of the principles discussed above did you find helpful? ::: :::success :pencil: **11.2 Not Supposed to be the Same** You are assisting a researcher with Python code that computes the Body Mass Index (BMI) of patients. The researcher is concerned because all patients seemingly have unusual and identical BMIs, despite having different physiques. BMI is calculated as **weight in kilograms** divided by the square of **height in metres**. Use the debugging principles in this exercise and locate problems with the code. What suggestions would you give the researcher for ensuring any later changes they make work correctly? ```python! patients = [[70, 1.8], [80, 1.9], [150, 1.7]] def calculate_bmi(weight, height): return weight / (height ** 2) for patient in patients: weight, height = patients[0] bmi = calculate_bmi(height, weight) print("Patient's BMI is:", bmi) ``` ``` Patient's BMI is: 0.000367 Patient's BMI is: 0.000367 Patient's BMI is: 0.000367 ``` > :::spoiler :eyes: ***Solution*** > * The loop is not being utilised correctly. `height` and `weight` are always > set as the first patient's data during each iteration of the loop. > > * The height/weight variables are reversed in the function call to > `calculate_bmi(...)`, the correct BMIs are 21.604938, 22.160665 and 51.903114. ::: ### 12. Command-Line Programs :::success :pencil: **12.1 Arithmetic on the Command Line** Write a command-line program that does addition and subtraction: ```bash! $ python arith.py add 1 2 ``` ``` 3 ``` ```bash! $ python arith.py subtract 3 4 ``` ``` -1 ``` > :::spoiler :eyes: ***Solution*** > ```python! > import sys > > def main(): > assert len(sys.argv) == 4, 'Need exactly 3 arguments' > > operator = sys.argv[1] > assert operator in ['add', 'subtract', 'multiply', 'divide'], \ > 'Operator is not one of add, subtract, multiply, or divide: bailing out' > try: > operand1, operand2 = float(sys.argv[2]), float(sys.argv[3]) > except ValueError: > print('cannot convert input to a number: bailing out') > return > > do_arithmetic(operand1, operator, operand2) > > def do_arithmetic(operand1, operator, operand2): > > if operator == 'add': > value = operand1 + operand2 > elif operator == 'subtract': > value = operand1 - operand2 > elif operator == 'multiply': > value = operand1 * operand2 > elif operator == 'divide': > value = operand1 / operand2 > print(value) > > main() > ``` ::: :::success :pencil: **12.2 Finding Particular Files** Using the `glob` module introduced earlier, write a simple version of `ls` that shows files in the current directory with a particular suffix. A call to this script should look like this: ```bash! $ python my_ls.py py ``` ``` left.py right.py zero.py ``` > :::spoiler :eyes: ***Solution*** > ```python! > import sys > import glob > > def main(): > """prints names of all files with sys.argv as suffix""" > assert len(sys.argv) >= 2, 'Argument list cannot be empty' > suffix = sys.argv[1] # NB: behaviour is not as you'd expect if sys.argv[1] is * > glob_input = '*.' + suffix # construct the input > glob_output = sorted(glob.glob(glob_input)) # call the glob function > for item in glob_output: # print the output > print(item) > return > > main() > ``` ::: :::success :pencil: **12.3 Changing Flags** Rewrite `readings.py` so that it uses `-n`, `-m`, and `-x` instead of `--min`, `--mean`, and `--max` respectively. Is the code easier to read? Is the program easier to understand? > :::spoiler :eyes: ***Solution*** > ```python! > # this is code/readings_07.py > import sys > import numpy > > def main(): > script = sys.argv[0] > action = sys.argv[1] > filenames = sys.argv[2:] > assert action in ['-n', '-m', '-x'], \ > 'Action is not one of -n, -m, or -x: ' + action > if len(filenames) == 0: > process(sys.stdin, action) > else: > for filename in filenames: > process(filename, action) > > def process(filename, action): > data = numpy.loadtxt(filename, delimiter=',') > > if action == '-n': > values = numpy.min(data, axis=1) > elif action == '-m': > values = numpy.mean(data, axis=1) > elif action == '-x': > values = numpy.max(data, axis=1) > > for val in values: > print(val) > > main() > ``` ::: :::success :pencil: **12.4 Adding a Help Message** Separately, modify `readings.py` so that if no parameters are given (i.e., no action is specified and no filenames are given), it prints a message explaining how it should be used. > :::spoiler :eyes: ***Solution*** > ```python! > # this is code/readings_08.py > import sys > import numpy > > def main(): > script = sys.argv[0] > if len(sys.argv) == 1: # no arguments, so print help message > print("""Usage: python readings_08.py action filenames > action must be one of --min --mean --max > if filenames is blank, input is taken from stdin; > otherwise, each filename in the list of arguments is processed in turn""") > return > > action = sys.argv[1] > filenames = sys.argv[2:] > assert action in ['--min', '--mean', '--max'], \ > 'Action is not one of --min, --mean, or --max: ' + action > if len(filenames) == 0: > process(sys.stdin, action) > else: > for filename in filenames: > process(filename, action) > > def process(filename, action): > data = numpy.loadtxt(filename, delimiter=',') > > if action == '--min': > values = numpy.min(data, axis=1) > elif action == '--mean': > values = numpy.mean(data, axis=1) > elif action == '--max': > values = numpy.max(data, axis=1) > > for val in values: > print(val) > > main() > ``` ::: :::success :pencil: **12.5 Adding a Default Action** Separately, modify `readings.py` so that if no action is given it displays the means of the data. > :::spoiler :eyes: ***Solution*** > ```python! > # this is code/readings_09.py > import sys > import numpy > > def main(): > script = sys.argv[0] > action = sys.argv[1] > if action not in ['--min', '--mean', '--max']: # if no action given > action = '--mean' # set a default action, that being mean > filenames = sys.argv[1:] # start the filenames one place earlier in the argv list > else: > filenames = sys.argv[2:] > > if len(filenames) == 0: > process(sys.stdin, action) > else: > for filename in filenames: > process(filename, action) > > def process(filename, action): > data = numpy.loadtxt(filename, delimiter=',') > > if action == '--min': > values = numpy.min(data, axis=1) > elif action == '--mean': > values = numpy.mean(data, axis=1) > elif action == '--max': > values = numpy.max(data, axis=1) > > for val in values: > print(val) > > main() > ``` ::: :::success :pencil: **12.6 A File-Checker** Write a program called `check.py` that takes the names of one or more inflammation data files as arguments and checks that all the files have the same number of rows and columns. What is the best way to test your program? > :::spoiler :eyes: ***Solution*** > ```python! > import sys > import numpy > > def main(): > script = sys.argv[0] > filenames = sys.argv[1:] > if len(filenames) <=1: #nothing to check > print('Only 1 file specified on input') > else: > nrow0, ncol0 = row_col_count(filenames[0]) > print('First file %s: %d rows and %d columns' % (filenames[0], nrow0, ncol0)) > for filename in filenames[1:]: > nrow, ncol = row_col_count(filename) > if nrow != nrow0 or ncol != ncol0: > print('File %s does not check: %d rows and %d columns' % (filename, nrow, ncol)) > else: > print('File %s checks' % filename) > return > > def row_col_count(filename): > try: > nrow, ncol = numpy.loadtxt(filename, delimiter=',').shape > except ValueError: > # 'ValueError' error is raised when numpy encounters lines that > # have different number of data elements in them than the rest of the lines, > # or when lines have non-numeric elements > nrow, ncol = (0, 0) > return nrow, ncol > > main() > ``` ::: :::success :pencil: **12.7 Counting Lines** Write a program called `line_count.py` that works like the Unix `wc` command: * If no filenames are given, it reports the number of lines in standard input. * If one or more filenames are given, it reports the number of lines in each, followed by the total number of lines. > :::spoiler :eyes: ***Solution*** > ```python! > import sys > > def main(): > """print each input filename and the number of lines in it, > and print the sum of the number of lines""" > filenames = sys.argv[1:] > sum_nlines = 0 #initialize counting variable > > if len(filenames) == 0: # no filenames, just stdin > sum_nlines = count_file_like(sys.stdin) > print('stdin: %d' % sum_nlines) > else: > for filename in filenames: > nlines = count_file(filename) > print('%s %d' % (filename, nlines)) > sum_nlines += nlines > print('total: %d' % sum_nlines) > > def count_file(filename): > """count the number of lines in a file""" > f = open(filename,'r') > nlines = len(f.readlines()) > f.close() > return(nlines) > > def count_file_like(file_like): > """count the number of lines in a file-like object (eg stdin)""" > n = 0 > for line in file_like: > n = n+1 > return n > > main() > > ``` ::: :::success :pencil: **12.8 Generate an Error Message** Write a program called `check_arguments.py` that prints usage then exits the program if no arguments are provided. (Hint: You can use `sys.exit()` to exit the program.) ```bash! $ python check_arguments.py ``` ``` usage: python check_argument.py filename.txt ``` ```bash! $ python check_arguments.py filename.txt ``` ``` Thanks for specifying arguments! ``` :::