--- title: 'Studio 4: FizzBuzz' label: 'studio' layout: 'post' geometry: margin=2cm tags: studio --- # CS 100: Studio 4 ### Programming Practice ##### October 5, 2022 ### Instructions During today's studio, you will be practicing with some programming fundamentals. **Please write all of your code in an R** ***script*** (not in an R markdown file, like usual). Upon completion of all tasks, a TA will give you credit for today's studio. ### Objectives By the end of this studio, you will be able to: * program with for conditionals * program with for loops * program with functions ### Part 1: Programming Review in R The first task in this studio is designed to help you synthesize the programming concepts (if statements, for loops, and functions) you’ve been learning about recently. Each of these concepts is described in this write-up, but you are free to skip ahead directly to the task, if you feel confident in your knowledge of these concepts. ##### If Statements If statements are used to direct the flow of a program based on whether or not various conditions hold. In computer science parlance, these conditions are called **predicates**. Below are some examples of predicates. Think about whether each one is true or false. If you are unsure, ask R. ~~~{r} TRUE # == tests equality 1 == 0 # != tests inequality 1 != 0 100 > 1000 x <- "erin" y <- "anna" x < y ~~~ You can combine predicates using `&&` (and) and `||` (or). Predicates connected by `&&` are true if both sub-predicates are true, and predicates connected by `||` are true if either (or both) sub-predicates are true. You can also negate a predicate using `!` (bang) so that true predicates become false, and vice versa. Here are some more examples. Again, think about whether each one is true or false, asking R for help as necessary: ~~~{r} ! FALSE TRUE && FALSE TRUE || FALSE 3 * 4 == 12 ! (3 * 4 != 12) 5 == 5 && 7 > 6 5 == 5 && ! (7 > 6) ! (5 == 5 && 7 > 6) (1 > 2 || 4 > 3) || (10 >= 1 && 0 != 1) ~~~ Now that you understand predicates, you can use them as conditions in if statements: ~~~{r} if (TRUE) { print("This will print!") } if (5 > 10) { print("This won’t print!") } ~~~ If you want to check a chain of predicates, you can make use of the `else` and `else if` statements. ~~~{r} x <- 5 # The %% operator returns the remainder of two numbers. It’s called the mod (short for modulo) operator! if (x %% 3 == 0) { print("This will print if x is a multiple of 3") } else { print("This will print in all other cases") } if (x == 5) { print("x equals 5") } else if (x == 6) { print("x equals 6") } else if (x == 7) { print("x equals 7") } else if (x >= 8) { print("x is greater than or equal to 8") } else { print("x is less than 5 (or x is not an integer)") } ~~~ ##### For Loops Often in programming, you will find yourself wanting to perform an operation more than once. Iterating over blocks of code, rather than repeating yourself, is not only convenient, it helps prevent bugs. A for loop is a handy iterator. Below is an example: ~~~{r} for (i in 1:10) { print("Hello") } ~~~ This `for` loop will print "Hello" 10 times. The way it works is: the variable `i` changes value each time the loop is entered. The first time its value is 1, next it is 2, then 3, and so on, all the way up to 10, since the range of values specified is `1:10`. Since `i` is a variable, the code inside a `for` loop can refer to `i`, and take advantage of the fact that its value is updated with each iteration. Here is another example, in which the numbers 10 through 20 (inclusive), are printed out: ~~~{r} for (i in 10:20) { print(i) } ~~~ It is also possible to do the reverse, namely print 20 through 10 instead: ~~~{r} for (i in 20:10) { print(i) } ~~~ Note, however, that the output of this program is identical to that of our first example: ~~~{r} for (i in 20:10) { print("Hello") } ~~~ **Programming Tip:** Pick particular variable names (`i` is a popular one, and so is `j`) to use as counters in your loops, and *do not use these variables elsewhere in your programs*. That way, it will be easy to keep track of your counters’ values, and you won’t get confused by their values changing unexpectedly. ##### Functions We’ve talked a lot about using built-in functions, but a major part of programming is writing your own functions! The most readable (and hence, bug-free) code usually consists of lots and lots of small functions pieced together into one large program. This is what a typical function looks like: ~~~{r} name_of_function <- function(argument1, argument2) { statements the_function's_value } ~~~ Observe that it has a name (i.e., `name_of_function`) and it has a body. The body is the code enclosed within curly brackets `{` and `}`. Most functions take as input at least one argument, although none are required. This example takes two. Finally, a function usually returns a value to its caller, although it doesn’t have to. (Programming languages are very flexible!) In R, the value that is calculated on the last line of a function is automatically returned to the function’s caller (regardless of whether or not the caller might have any use for that value). Note that you can also explicitly return values using the `return` keyword, as follows: ~~~{r} name_of_function <- function(argument1, argument2) { statements return(the_function's_value) } ~~~ Here’s an example of an extremely simple function that adds one to its argument, which is hopefully a number: ~~~{r} add_one <- function(num) { added_one <- num + 1 added_one } ~~~ As you can see, this function creates an object, `added_one`, which is then returned as the function's value. For example, the value of `add_one(5)` is 6. Here’s another cleaner way we could have written the `add_one` function: ~~~{r} add_one <- function(num) { num + 1 } ~~~ Let’s look at another example. This function that takes as input two arguments and returns their sum: ~~~{r} add_two_nums <- function(x, y) { x + y } ~~~ For example, `add_two_nums(1, 2)` evaluates to 3. Here’s a variant of the above, where we assign a default value of `1` to the second argument. ~~~{r} add_two_nums <- function(x, y = 1) { x + y } ~~~ Now `add_two_nums` behaves just like `add_one` by default, adding one to its first argument when the second is omitted, and summing both its arguments when two values are given. More details about functions in R can be found [here](http://adv-r.had.co.nz/Functions.html). ##### Task Now that you’ve learned about if-statements, for-loops, and functions, write a function `sum.evens` that takes as input a number `n` and returns the sum of all even numbers between 1 and `n`, inclusive. *Hint:* Use the `mod` operator, `%%` in R, to test whether a number is even or odd. *More detailed hint:* Modular arithmetic gives the remainder when dividing some number `x` by some number `y`. For example, `7 %% 3`, read as “7 mod 3”, gives 1 because 7 divided by 3 is 2 with a remainder of 1. Solve the following examples by hand, and then run them in R to verify your understanding. ~~~{r} 8 %% 6 7 %% 4 10 %% 5 3 %% 3 ~~~ ### Part 2: FizzBuzz: A Programming Puzzle The FizzBuzz problem is a short programming task, often asked during software engineering interviews. Here is one variant: Write a function called FizzBuzz that takes as input a number, and prints “Fizz!” if the number is a multiple of 3, “Buzz!” if the number is a multiple of 5, and “FIZZ BUZZ!!!” if the number is a multiple of both 3 and 5. If the number is neither a multiple of 3 nor a multiple of 5, it should just print a sad face. Here are some sample inputs and their corresponding outputs: ~~~{r} 2: :( 3: Fizz! 5: Buzz! 6: Fizz! 15: FIZZ BUZZ!!! 19: :( ~~~ ##### Task Write a function `fizzbuzz` that takes as input a number `n` and prints out the number as well as `Fizz!`, `Buzz!`, `FIZZ BUZZ!!!`, or `:(`, as required. Note that printing both `Fizz!` and `Buzz!` if you encounter a multiple of both 3 and 5 instead of `FIZZ BUZZ!!!` is incorrect. **Hint**: You may find `if else` statements helpful! **Another hint:** Feel free to use the `cat` function, which prints a combination of variables and strings, with spaces in between them. For example, ~~~{r} hello <- "Hello" world <- "world" year <- 2022 cat(hello, world, year, "!") ~~~ This code will display `Hello world 2022 !`. Once you have a working version of `FizzBuzz`, write a `for` loop that calls your `fizzbuzz` function on a vector comprising a few of your favorite numbers. *Hint:* You can create a vector of your favorite numbers using the `c` function. **N.B.** The remainder of this section is optional. Skip over it if you are short on time. Next write another `for` loop that calls your `fizzbuzz` function on all numbers between 1 and 20, and again on only the even numbers numbers between 1 to 20. Do you see any opportunities for abstraction? Discuss this possibility with your partner before proceeding. Write a function that takes as input a vector, and calls your `fizzbuzz` function on all the entries in that vector. Then run this new function on all three of the aforementioned vectors: your favorite numbers, all numbers between 1 and 20, and only the even numbers between 1 and 20. ### Part 3: Measures of Dispersion In addition to measures of central tendency (mean, median, and mode), measures of variability are also helpful when trying to make sense of data. Let's take a look at some of the common measures of variability! Standard deviation is one such measure. Given a set of $n$ data points $X = \{ x_i \mid i = 1, \ldots, n \}$, the formula for **variance** is $$\sigma^2_X = \frac{1}{n} \sum_{i=1}^n (x_i - \mu_X)^2,$$ where $\mu_X$ is the mean. The **standard deviation** is then $$\sigma_X = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2}.$$ Looking at this formula, we see that what is being computed first here is the average of the squared differences between each data point and the mean. This quantity is called (population) variance. Then, after taking square roots, we arrive at the (population) standard deviation. The standard deviation can be easier to interpret than the variance, since it is of the same magnitude as the data. Hopefully, by now you have computed these quantities for a few variables in a spreadsheet. Just to remind you, here are the necessary steps: 1. Compute the average of your measurements 2. Calculate the distance between each measurement and this average 3. Square these distances 4. Compute the average of these squared distances, by dividing by the number of measurements 5. At this point, you would have calculated the variance. If you are interested in the standard deviation, rather than the variance, take the square root. Hopefully, by now, you have followed the steps listed above to compute the variance and standard deviation of a set of measurements in a spreadsheet. Your present goal is to write a program that follows these steps. ##### Task Load up one of R's built-in data sets, such as "women" or "iris" or "rivers". Pick a variable (i.e., column) of interest, such as women's heights, and write a program that computes the variance and standard deviation of that variable. You should complete this task by writing a `for` loop that iterates over a vector of measurements to complete step 2, not by calling the built-in functions `var` and `sd`. When you are done, however, you can use the built-in functions to check your work. *You may find that you have to divide by one less than the number of measurements for your results to match those of the built-in functions.* **N.B.** The formulas above are for *population* variance and standard deviation, but the `var` and `sd` formulas calculate the *sample* variance and standard deviation, respectively. You use the former when you are calculating the spread of an entire population, and the latter when your data are only a representative sample of that population. **N.B.** R provides extensive support for vector arithmetic, so it is rarely necessary to write `for` or `while` loops that iterate over the values in a vector as you have done here (merely, for practice). On the contrary, this one line of code computes the standard deviation of a vector of numbers called `nums`: ~~~ sqrt(mean((nums - mean(nums)) ^ 2)) ~~~ The expression `nums - mean(nums)` computes the difference between each value in `nums` and the mean of `nums`, while the expression `(nums - mean(nums)) ^ 2` squares of all the ensuing values. The outer two functions take the mean of these squared values, and then the square root, to arrive at the population standard deviation. **Just for fun:** Use vector arithmetic to computer the covariance $\sigma_{XY}$ and/or correlation $\rho_{XY}$ between two (random) variables $X$ and $Y$, such as women's heights and weights. Here are the formulas for these two quantities, where $\mu_X$ and $\mu_Y$ denote their means: $$\sigma_{XY} = \frac{1}{n} \sum_{i=1}^n (x_i - \mu_X) (y_i - \mu_Y).$$ $$\rho_{XY} = \frac{1}{n} \sum_{i=1}^n \frac{(x_i - \mu_X)}{\sigma_X} \frac{(y_i - \mu_Y)}{\sigma_Y}.$$ ### End of Studio When you are done please call over a TA to review your work, and check you off for this studio.