# [0320] Generics 2 ## Table of Contents 1. [Making Sense of Generics: the LSP](#lsp) 2. [Type Variables and Wildcards](#wildcards) 3. [Discussion: Taxicab Data](#taxis) #### FYI Next time we'll continue the patterns and generics discussion. If you've been curious about performance (e.g., caching your data sources), we'll also talk some about that. ## Making Sense of Generics: the LSP <a name="lsp"></a> Let's start with a puzzle about subtyping. We know that `Object` is a supertype of `Number`, which is a supertype of `Integer`. One thing this buys us is that we can plug in something with `Number` type anywhere an `Object` is expected, or pass an `Integer` to a method that accepts `Numbers`. How does this work with generic types? Here's a method: ```java static Number sumAll(Collection<Number> numbers) { int sum = 0; for(Number num : numbers) { sum += num.intValue(); // "may involve rounding or truncation" } return sum; } ``` ### Which types work to store the result of `sumAll`? The method by itself typechecks just fine. Now let's call `sumAll`. We'll start by giving it an empty collection of numbers, and saving the result to a variable. _What types of variable can we store the return value of `sumAll` in?_ Concretely, which of the 3 assignments below do you expect to produce a type error? ```java Collection<Number> someNums = new HashSet<>(); Number aNumber = sumAll(someNums); // ? Integer anInteger = sumAll(someNums); // ? Object anObject = sumAll(someNums); // ? ``` <details> <summary><B>Think, then click!</B></summary> `Number` and `Object` both work OK, but `Integer` produces a type error. This is perhaps what we expected: after all, that's how subtyping works in languages like Java! The type system doesn't have enough info to know that the return value is always an `Integer` (although it _is_), and so it forces us to use variables of a less-specific type. </details> <BR/> ### Which types work as arguments to `sumAll`? Now suppose we have 3 `Collection` objects we could pass in to `sumAll`. Keeping in mind the result of the last experiment, which of these do you expect to work? ```java Collection<Number> someNums = new HashSet<>(); Collection<Integer> someInts = new HashSet<>(); Collection<Object> someObjects = new HashSet<>(); sumAll(someNums); // ? sumAll(someInts); // ? sumAll(someObjects); // ? ``` <details> <summary><B>Think, then click!</B></summary> Only `someNums`, the `Collection<Number>`, works. Both of the others produce a type error. In spite of the fact that `Integer` _is_ a subtype of `Number`, `Collection<Integer>` is not a subtype of `Collection<Number>`. The same goes for `List`, `Set`, and so on. </details> <BR/> ### Wait, what? Why do you think this is the case? Is Java's type checker just bad? One way to explore whether or not this behavior is reasonable is to experiment. Suppose that Java had let us use a `Collection<Integer>` as a subtype of `Collection<Number>`. Can you write a program that would then produce a run-time type error? (Hint: you don't need more than a 2 or 3 lines; you don't even need to use the `someAll` function.) <details> <summary><B>Think, then click!</B></summary> Here's one: ```java someNums = someInts; // the problematic line someNums.add(Math.PI); // adding a Number to a collection of Numbers for(int i : someInts) { System.out.println(Integer.numberOfLeadingZeros(i)); } ``` </details> <BR/> This is why Java does what it does. But sometimes we really do want to accept "a set of any kind of number" without knowing in advance exactly which type it is. And this is where generics become a little bit more complex. Before we start, I want to cover a rule that can _really_ help clear up confusion about generic types. It's called the Liskov Substitution Principle or LSP. You can look up the full LSP if you want, but here I'm going to put a particular spin on it: > If you're able to safely use an object of type $T$ someplace, you should also be able to safely use an object of type $S$, where $S$ is a subtype of $T$. This is the guiding principle that the above example violates, and it's worth keeping in mind as you work with generics in the future. ## Type variables and Wildcards <a name="wildcards"></a> Let's get more concrete, and return to the example from last time. We were trying to write a type for a sorting function. All we wanted to know about the input was that it was a list of something for which comparison was defined. We ended up here: ```java public static <T extends Comparable<T>> void sortSomeRecords(List<T> lst) ``` Hopefully the above example motivates why we needed the type variable: _any_ kind of `Comparable` will do, so long as it can compare against its own type. Java also allows _wildcard_ type variables, which are written `?`. A wildcard represents a type variable that won't be used elsewhere, so doesn't require a unique name. But we couldn't have used a wildcard here, since we needed `T` both to say what type to compare against, and to label the argument. Java's standard library sorting function, however, uses both a type variable and a wildcard: ```java public static <T extends Comparable<? super T>> void sort(List<T> list) ``` This allows `T` to implement comparisons against any of its supertypes. (By the way, the bit declaring `T` isn't part of the return type; it's just a note to the type checker.) ### What a variable means Here are some things to try: ```java someNums.add(1); // setup (works) someNums.add(1.5); // setup (works) Collection< ? extends Number> someNums2 = new HashSet<>(); // ? someNums2.add(1); // ? someNums2.add(1.5); // ? someNums2.add(null); // ? someNums2 = someNums; // ? ``` What's going on? <details> <summary><B>Think, then click!</B></summary> `? extends Number` does not mean "a list of objects which all extend `Number`". It means "a list of objects of _one single type_, where that type extends number". We can assign `someNums` to `someNums2` because `someNums` is a collection of `Number` (which fits into the wildcard). But we cannot add an `Integer` to `someNums2`, not even if we try to typecast. Why? Because we could have assigned `someInts` to `someNums2` instead, or `someDoubles` or `someFloats`! Java has no guarantee (at least, not without a far more sophisticated and time-consuming code-crawl) that it will be safe to add an integer to whatever `someNums2` references. </details> <BR/> #### A resource We're not done with generics yet, but I want to point out a great resource: [Angelika Langer's excellent Java generics guide](http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html), which I link here with permission. This guide would be tough to assign "reading" from, but it's an incredibly valuable place to go if you have specific questions about generics. ## Discussion: Taxicab Data <a name="taxis"></a> For today, we asked you to read this (short) article: [Of Taxis and Rainbows](https://tech.vijayp.ca/of-taxis-and-rainbows-f6bc289679a1). We also asked you to check out the corresponding [Hacker News thread](https://news.ycombinator.com/item?id=7926358), especially the rebuttal to the top comment. (Which parts do you agree with, and which parts might you be skeptical about?) I am outlining the discussion here only, since so much will be driven by you in class! If you're just reading the notes, please think about each of these discussion points and then watch the lecture capture. #### Context Later on in the semester we're going to talk about "threat modeling" which is a process for identifying potential security flaws in your system. but you're already starting to look at data sets with an eye to security issues -- so let's get a head start. (1) Let's talk about the core anonymization issue that the article reported on. - what's in the data, and what was anonymized? - what did they do to anon the data before release? - what went wrong? (how was the anon. circumvented?) - is this attack a generally useful technique? for what? - what could have been done differently? Takeaway: anonymization is hard, and impossible to undo mistakes. think carefully, and seek professional advice. Here, the FOIA request was answered in *2 days*. Is that enough time? (2) What could an adversary do with this data? - vs. driver, taxi? - don't get distracted by the anon issue: what about the passengers? - is there any risk of correlation with other data? - are there possible defenses? (3) Let's analyze this Hacker News exchange. There were 2 arguments I thought were interesting near the top: - "NYC is too dense for reasonable correlation" - "nobody who lives in a non-dense part of NYC can afford to take a taxi anyway" New York City is a lot more than skyscrapers. It includes, say, Staten Island: ![](https://i.imgur.com/TYsQNas.png) Here's a random Google street view: ![](https://i.imgur.com/bDID3iU.jpg) (4) is it always possible to do data science w/o compromising people's privacy? - No. (Take Malte's 2390.)