Random Testing

# Random Testing Today we'll: * talk about sorting arbitrary records, rather than just numbers; * quickly introduce one last sorting algorithm; and * learn how to use a random-number generator as an assistive device to help us test better. The presentation of the new sort, _Quicksort_, is deliberately rushed to showcase the value of random testing. ## Sorting more than numbers Suppose we had a class like: ```python # Immutable, but also auto-generate < @dataclass(frozen=True,order=True) class Record: age: int # exclude "name" from use in <, >, etc. name: str = field(compare=False) ``` The `order=True` parameter tells Python to automatically create ordering functions (`<`, `>`, etc.) that work over instances of this dataclass. The `field` declaration excludes the `name` field from these comparisons. As a result, `Record`s will be ordered entirely by their `age` field. If we wanted to sort lists of `Record`s, how would we need to change our merge sort function from last time? <details> <summary>Think, then Click!</summary> We won't need to make any changes at all. Because the comparison functions are defined for `Record`, our existing code will handle `Record`s just fine. That's the power of polymorphism! Of course, if we tried to sort a list that contained both numbers and records, we'd get an error, since `<` is only defined on pairs of numbers or pairs of records. </details> Homework 5 will ask you to think through some other implications of using records instead of numbers. ## A final sorting algorithm Suppose we liked the overall structure of merge sort, but we didn't like the fact that the actual sorting is done in `merge`. We could create a variant that splits the list in a way that lets us combine sublists with `+`, instead. ```python= def quick_sort(l: list) -> list: if len(l) <= 1: return l[:] # ??? side1_sorted = quick_sort(side1) side2_sorted = quick_sort(side2) return side1_sorted + side2_sorted ``` If we want to avoid a more complicated merge, however, we need to split the list by some more elaborate method than just slicing it in half. How about we pick some arbitrary element in the list and use _that value_ as a dividing line? Concretely, we'll separate out the elements that are less than this element (which we'll call the _pivot_) and those greater than it. ```python def quick_sort(l: list) -> list: if len(l) <= 1: return l[:] pivot = l[0] # many other choices possible smaller = [x for x in l if x < pivot] larger = [x for x in l if x > pivot] smaller_sorted = quick_sort(smaller) larger_sorted = quick_sort(larger) return smaller_sorted + [pivot] + larger_sorted ``` How confident are we that this implementation works? ## Random Testing ### Motivation: Humans Think of a country. What country are you thinking of? <details> <summary>Think, then click!</summary> Chances are, the country you thought of was: - close to home; - large; or - in the news often. I'd bet that the country you thought of was also in existence today. You probably didn't say "the USSR" or "Austria-Hungary". And note that my choices there were all limited by my own historical knowledge. I went and [looked up more](https://en.wikipedia.org/wiki/List_of_former_sovereign_states) after writing that sentence. Even if we only count nations that existed after the U.N. was created, there are many: the Republic of Egypt (1953-1958), the Fourth Brazilian Republic (1946-1964), etc. </details> Why does that impact software testing? <details> <summary>Think, then click!</summary> You can only test what you have loaded into your mental cache. If you aren't thinking of it at the moment, or haven't been thinking of it recently, you likely won't test it. At least, not without very careful and disciplined thought. </details> But that's dangerous. If humans are innately poor at testing (and even those with training aren't always great at it) then is testing just doomed in general? ### Computers as assistive devices When confronted with our limitations, humans create assistive devices. These have included (e.g.) the [pole lathe](https://en.wikipedia.org/wiki/Pole_lathe), the [slide rule](https://en.wikipedia.org/wiki/Slide_rule), and so on. Maybe we can use our computers as an assistive device to help us with testing? What's the hard part of writing a test case? Usually it's coming up with a creative input that exercises something special about the program. What if we let a computer come up with inputs for us? That would work something like this: #### (1) Building random lists We'd write a function that produces random lists (in this case, of integers): ```python def random_list(max_length: int, min_value: int, max_value: int) -> list: length = randint(0, max_length) return [randint(min_value, max_value) for _ in range(length)] ``` #### (2) Running our sort on those random dlists ```python MAX_LENGTH = 100 MIN_VALUE = -100000 MAX_VALUE = 100000 NUM_TRIALS = 100 def test_quicksort(): for i in range(NUM_TRIALS): test_list = random_list(MAX_LENGTH, MIN_VALUE, MAX_VALUE) quick_sort(test_list) ``` This may look like it's not doing anything. While it's true that there's no `assert` statement yet, we _are_ testing something: that the `quick_sort` function doesn't crash! This sort of output-less testing is called `fuzzing` or `fuzz testing` and it's used a lot in industry. After all, there's nothing that says we need to stop at `100` trials! I could run this millions of times overnight, and gain some confidence that my program doesn't crash (even if it might not produce a correct result overall). #### (3) Dealing with the output question Of course, we'd still like to actually _test the outputs_ on these randomly generated inputs. It wouldn't really be feasible to produce a million inputs automatically, and then manually go and figure out the corresponding outputs. Instead, we should ask: do we have another source of the correct answer? Indeed we do: our own merge sort, or better yet, Python's built in sorting function. ```python MAX_LENGTH = 100 MIN_VALUE = -100000 MAX_VALUE = 100000 NUM_TRIALS = 100 def test_quicksort(): for i in range(NUM_TRIALS): test_list = random_list(MAX_LENGTH, MIN_VALUE, MAX_VALUE) assert quick_sort(test_list) == merge_sort(test_list)) ``` As it turns out, this procedure quickly finds a bug in our `quick_sort` implementation: if a list has duplicate elements, those duplicates will be deleted! Sadly, we're only retaining one copy of the pivot in the above code. Instead, we'll: ```python def quick_sort(l: list) -> list: if len(l) <= 1: return l[:] pivot = l[0] # many other choices possible smaller = [x for x in l if x < pivot] larger = [x for x in l if x > pivot] same = [x for x in l if x == pivot] smaller_sorted = quick_sort(smaller) larger_sorted = quick_sort(larger) return smaller_sorted + same + larger_sorted ``` ### Perspective It doesn't always make sense to use random testing, but when it does it's a powerful technique: it literally allows you to mine for bugs while you sleep! Hopefully it goes without saying, but there's nothing specific to sorting about this technique: in fact, most of you could use it to help test your final projects, if you wanted! (The key is not to rely on it completely: always use random testing to augment an existing, cleverly-chosen set of manual test cases. And if your random testing finds a new bug, as we did today, just add that input and output to your manual suite: you'll never let that bug sneak by you again!) There's a lot more to this story. Homework 5 asks you to try out random testing, and reflect on how it might break down if we switch from sorting lists of integers to lists of records. We'll have a "fun" lecture after break to discuss this and talk about ways to address the problems _you_ will uncover!