# 13: Leveling Up Testing (2) ###### tags: `Tag(sp22)` ## Logistics ![](https://i.imgur.com/I8RLRlr.png) * I've asked to be re-added to the "preferred advisor" list for a day or two. * Submit your project ideas today! ## Sprint 3 Advice Are you allowed to modify the stencils for Sprints? What do you think? In prior classes, you might have been given an immutable stencil---you'd better not change it!---and filled in code. That's not how we look at stencils. The stencils we give are a starting point; **you can modify them as you see fit.** That's the 0320 way. A case in point: if you really want to add a REPL to your backend in Sprint 3, but to do so you'd need to modify the `Main` class we gave you---modify it all you want. ## Is Software Like a Hammer? Let's start with a short discussion. Some have said that software is like a hammer: it's a morally neutral tool that can be used for either good or evil. You can build a house with a hammer, or you can commit murder. #### Discussion: What do you think about this? It's an appealing argument! In some ways, software might indeed be viewed as morally neutral. But, personally, I think that the "hammer" analogy is too glib. My father was a carpenter, and we had a [table saw](https://en.wikipedia.org/wiki/Table_saw) in our garage. A table saw is a terrifying piece of machinery; it's got a large rotary blade that slices through wood as you push it along the table. It's also a _complex_ piece of machinery: there are various levers and dials to control the height of the blade and other factors, but also numerous safety features. Dad's didn't have as many as modern saws have, but there was still an emergency stop button and a blade guard. My point is: software isn't a hammer. Software is more like a table saw. If you don't think about safety when you design a table saw, people are going to get badly hurt. Yeah, you can still hurt yourself with a hammer (I have, multiple times). But the greater the degree of potential harm, the more careful thought and added safety-features are called for. And don't get lured in by "software is like a hammer" statements. Something can be morally neutral in itself and still warrant careful design for safety (or a decision not to make it at all). There's a quote from Robert Oppenheimer I like, from the ["personnel hearing"](https://www.osti.gov/opennet/hearing) in which he lost his clearance: > "When you see something that is technically sweet, you go ahead and do it and argue about what to do about it only after you’ve had your technical success. That is the way it was with the atomic bomb." ## What's Hardest About Testing? What do you think is harder to think of: test _inputs_ or the corresponding _outputs_? <details> <summary>Think, then click!</summary> Usually it's the inputs that are harder, because they require clever thinking about the space of inputs. Outputs are often an exercise in running the program in your mind. </details> <br/> Where else might we find test inputs, and what problems might occur with each source? - someone else's tests? (same human biases, but ok, could help) - monitor a real system (good idea; possibly different biases? overfitting to their use-case? may be ok, may not. could be expensive or affect performance.) - random generation? (possibly a lot of weird values that aren't really reflective of reality, what's the distribution...?) ## Fuzz Testing Let's build on the "random test generation" idea. Suppose we could randomly generate inputs. What could we do with them? The simplest, but still incredibly powerful, technique is to probe for crashes. Just keep feeding the system random inputs and, if your generator is good, you're likely to eventually find those bizarre corner cases that lead to surprising exceptions. You don't need to do this in JUnit, or have a `Test` annotation or anything like that. We're just experimenting with the idea of fuzzing. ### What's a "surprising exception"? There's a bit of a caveat here. Sometimes programs are _supposed_ to throw an exception on certain inputs. Maybe the specification says that, "for input integers less than zero, the method throws `UnsupportedOperationException`". Then fuzzing is a bit more involved than trying to provoke _any_ exception. Instead, you might only look for exceptions like `NullPointerException` or other signals that indicate a bug, rather than a legitimate specified result. ### Takeaway This might seem like a ridiculous technique. But, in truth, it's used pretty commonly in industry in settings where crash bugs are expensive, highly impactful, or affect many people. (Think: operating systems, device drivers, and networks, to name a few.) And, by the way: **not all the inputs need to be random!** You can still have deviously crafted inputs hard-coded; just include them at the beginning before you start random generation. This is true of everything else we do with random inputs. ## Model-Based Testing There are a few definitions of this term, but the technique in this section is the one I like and tend to use. For other interpretations, see more advanced software engineering classes or 1710. In fuzz testing, we didn't pay much attention to the _output_ of the program. We just checked that it didn't raise an exception. It'd be great if we could use randomly-generated inputs for more than just fuzz-testing. But if we've got *randomly*-generated inputs, we can't pair them with *manually*-generated outputs. So now the question is: where can we get "correct" outputs from? Well, what if we had a known-good baseline implementation we were trying to optimize? Consider building one of those clever, super-optimized implementations of a basic arithmetic function. These matter in a lot of domains (e.g. graphics). Suppose you're testing: ```java= float fastFloor(float value) { // ... } ``` How long do you think it would take to loop through every possible primitive 32-bit `float` value, and check whether or not `fastFloor` returns the same value as `Math.floor`? <details> <summary>Think, then click!</summary> About 90 seconds. [Several years ago.](https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/) </details> Here, we don't even need random inputs! Of course, if the function took 64-bit `double` values, the story would be different, and we'd check random inputs. ### Another Setting In Sprint 0, you implementing a basic `Collection`-based nearest-neighbor search. In Sprint 1, some of you implemented the same algorithm using a kd-tree. You could do random, model-based testing here, too! Anytime we have a slow, but reliable, source of correct outputs, we can compare results with the new, possibly unreliable, program under test. Maybe you'd write a big for loop containing code that looks something like this: ```java List<Star> constellation = generateRandomConstellation(5); ListNaiveStarSearch search1 = new ListNaiveStarSearch(constellation); KDDStarSearch search2 = new KDTree(constellation); Star target = generateRandomStar(); List<Star> results1 = search1.getNaiveNearestNeighbors(NUM_NEAREST, target); List<Star> results2 = search2.getNearestNeighbors(NUM_NEAREST, target); assert results1.equal(results2); ``` This method would take in a randomly-generated coordinate set and goal, invoke the old and new implementations, and compare them. If the new implementation doesn't agree, maybe we've found a bug! ### Any Problems? Do you expect anything to go wrong? <details><summary>Think, then click!</summary> The issue is that the nearest-neighbor problem may have _many different correct answers_. Consider an input space with four coordinates: $(-1,1)$, $(1,-1)$, $(1,1)$, and $(-1,-1)$. If we're looking for the 2 nearest neighbors to $(0,0)$, which will be returned? All four are equally distant, so there are $4c2 = 6$ equally-correct solutions. If Implementation 1 happens to work slightly differently from Implementation 2, it's very likely you'll see direct comparisons of their output fail. So, instead of checking for exactly-equal elements in the lists returned, let's make sure their _distances from the goal_ are the same! </details> <br/> **Something very important has just happened. Pay attention!** We have stopped talking about testing against _specific outputs_, and have started testing more abstract _properties_ of outputs. We don't want to know whether the output lists are exactly the same, but we want to know whether they match in terms of nearness to the input goal. ## Property-Based Testing Is there any reason that our properties have to be written in terms of another implementation? No! What makes a nearest-neighbors output _correct_? If we can write this as a method, we can apply the same idea, without needing a naive implementation to check against. Before you click through below, we have an exercise today. Clone and pull the [0320 livecode repository](https://github.com/tnelson/cs32-livecode), if you haven't already. To clone it, run: ``` git clone git@github.com:tnelson/cs32-livecode.git ``` Then open the `S22` folder as a project in IntelliJ. Then look at the livecode source folder for today (`edu.brown.cs32.livecode.mar17`). If you can't open it, go ahead and just look at the code. ### Getting the repo running You should be able to use the same approach we give in the [setup guide](https://docs.google.com/document/d/1-O4c5KA2FgHeO9q6tfjp5zaHb9MLQbJfn3wMIhDlYGk/edit#heading=h.qsxbrbh41kar): opening the `pom.xml` file "As Project" in IntelliJ. #### Maven Configuration Expect this to grow over time, as I add new dependencies and plugins. I like to add new dependencies by inserting the appropriate XML into `pom.xml` and then selecting "Reload All Maven Projects" in the Maven panel. If you don't have this panel, or can't find the button, here are screenshots (MacOS 11.6, IntelliJ 2021.2.2 Community Edition): ![Viewing the Maven Panel](https://i.imgur.com/QUsoe8F.png) *Viewing the Maven Panel* ![Reloading Maven Projects](https://i.imgur.com/1gZfCjt.png) *Reloading Maven Projects* If I wanted to add the GSON library for serializing and de-serializing JSON messages, using the same version as the stencil code for your lab, I'd add the dependency: ```xml <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.8.6</version> </dependency> ``` in between the `<dependencies>` tags in my `pom.xml`, and then click that reload all button. There might be a delay while IntelliJ adds the new dependency and does other housekeeping work, but the reload should resolve all dependencies (barring other issues, like versioning or inaccessible packages, etc.) #### Running class livecode You should be able to navigate to one of the main files (named after the date of the class meeting) and click `Run` from the popup menu. #### A possible error If you get an error like `warning: source release 11 requires target release 11` it means that the version of Java that IntelliJ is trying to use isn't what the Maven configuration expects. You might see something like this if you scroll over in the error: `Current project JDK is 1.8.`, which tells you what the gap is. The first thing I'd check is what IntelliJ is using for the given run configuration: ![Yup, that's the wrong version alright!](https://i.imgur.com/IJHRdwD.png) *Yup, that's the wrong version alright!* In this case, revisit our [setup guide](https://docs.google.com/document/d/1-O4c5KA2FgHeO9q6tfjp5zaHb9MLQbJfn3wMIhDlYGk/edit#) and follow the instructions to install Java 11. Once Java 11 has been installed, go to `File->Project Structure` and select `JDK` and pick the version 11 you just installed. You should see something like this: ![](https://i.imgur.com/7xBUwRR.png) Click apply and OK. There may be a small delay while IntelliJ makes the change. Now you should be able to run. #### That seems like a lot of work... You probably won't have to do all of this! But if you've worked with a different Java version for another class (or just have been using Java for a while on the same machine) this sort of problem can come up. In fact, I had to follow this process myself since I worked with some research code that used Java 8 over the summer. ### Back To Fuzz/Property-Based Testing These files demonstrate fuzz-testing and property-based testing of a naive-nearest algorithm like you implemented in Sprint 0. However...maybe they aren't quite perfect. Your job is to critique them. Am I: * making unreasonable assumptions about inputs, or making unreasonable limitations when generating values? * describing the correct properties (English comments)? * encoding those properties correctly (Java)? * ...etc. Feel free to _run_ the code, if you're able to. We want this to be an interactive demo as much as a code-review exercise. Here's the [exercise form](https://forms.gle/SPAqqfPVVZkqT1ne7). Fill it in before clicking through. <details> <summary>Think, then click!</summary> * "For all coordinates $c$ in the input, if $c$ is not contained in the output then for all coordinates $c2$ in the output $dist(c, goal)$ >= dist(c2, goal)". * "For all coordinates $c$ in the output, $c$ is also in the input." * The length of the output list contains the expected number of points (or fewer, if the input set is smaller than $k$). </details>