Lecture 20: Subclass Polymorph

Lecture 20: Subclass Polymorphism, memory safety --- :::warning Announcements - **Please sit in the front two rows of the room.*** - Assignment 8 due tomorrow. - Next help hours: - Mine * 5:30-7:30 tonight ::: :::success Agenda for today - Object-oriented programming continued - More on inheritance/subclass polymorphism - Covariant and contravariant types - Relation to functional programming - Memory safety/management - What is memory safety? Why care? - What languages are memory safe? - (If time) How do safe languages handle memory? - Automatic memory management ::: Last class, we discussed objects and dynamic dispatch. This class, we'll continue to look more closely at when one type can be used in place of another (e.g., polymorphism). # More on subclass polymorphism We know that the point of polymorphism is to let us call a method on an object and get specific behavior depending on if it's a particular subclass. We can write one function that takes in the superclass type, and call it with multiple subclass types: ```java class Cat extends Animal {} // Elsewhere void doSomething(Animal a) { a.noise(); } // We can call doSomething on an Animal: doSomething(new Animal()); // AND on a Cat subclass of Animal: doSomething(new Cat()); // where noise() may have different implementations ``` This example should be very familiar from CS230, and depends on this rule: :::info For method calls, we can pass a subclass where any of its superclasses are expected. ::: An interesting question to consider is this: when a class extends another and overrides functions, do those **function definitions** need to have the exact same return types, or can the return type be a subclass or superclass? ## Group exercise: sub/super for method overriding definitions :::success Consider the following example: ```java /* Group exercise: discuss whether you think in the * New subclass overriding the superclass methods, Java * can change the: * 1: Return type: Java can safely let ??1 be a: * - subclass (Animal) * - superclass (Object) * - both * - neither: must be the exact type * 2: Argument type: Java can safely let ??2 be a: * - subclass (Animal) * - superclass (Object) * - both * - neither: must be the exact type * * Hint: consider how another class would use Groomer/New. */ class Groomer { Animal pickUpPet() { // ... } void dropOffPet(Animal a) { // ... } } class New extends Groomer { ?? pickUpPet() { // ... } void dropOffPet(?? a) { // ... } } ``` ::: :::spoiler Think, then click! The correct answer for what Java can *safely* do is: ``` What Java *could safely allow* for overriding ??1 (return type) : subclass ??2 (argument type): superclass ``` However, the *actual* Java rules are more conservative for object **argument types**: ``` What Java *actually allows* for overriding ??1 (return type) : subclass ??2 (argument type): neither (must be exact type) ``` ::: To understand why we can use a subclass for the return type, consider this code *using* the subclass: ```java // Declared type is Groomer, but dynamic type is New Groomer g = new New(); Animal ani = g.pickUpPet(); // Would it break things if ani was actually a Cat? // No, because Cat has all methods that Animal has! // Would it break things if ani was actually an Object? // Yes, this is unsafe! Object has no noise method defined. ``` To understand why we *could* safely use a superclass for the argument type, consider: ```java g.dropOffPet(new Dog()) // Consider: // Would it break things if the argument were actually a Dog? // Seeing the problem is a bit more complicated here, since // it requires another subclass. // Yes, because although dropOffPet is defined on Groomer, // the dynamic type of New only knows how to call dropOffPet // on Cat, not Dog. New's implementation of dropOffPet // might call Cat's "makeHairball", for example, which Dogs cannot do! // Would it break things if the argument were actually an Object? // No, this would not break things. Cat's dropOffPet // implementation could only call Object's methods, but // that is fine. // However, this case is less intuitive! So Java does not // allow it for method overriding, and instead would treat // the method definition as overloading. ``` ## Contravariance and covariance In **general** safe OOP design (and other contexts with *subtyping*), when overriding methods defined in a superclass, the subclass does not need to use the **exact** same argument and return types. Rather: 1. The return type of the overridden method may be a **subclass** of the return type in the superclass. This is an example of **covariance**. 2. The argument type in the subclass may be a **superclass** of the argument type in the superclass. This is an example of **contravariance**. This is *safe* for OOP, but is **not actually implemented** by Java: Java makes a conservative choice to not support this for overridden methods. ### Covariance/contravariance in general **Covariance** means you can substitute a more specific (sub)type in a place where a more general (super)type is expected. - Applies to: Return types (including Java), arrays (though this can cause issues, which are out-of-scope today), some generic type parameters. - Intuition: “Going down the type hierarchy (more specific) is OK.” **Contravariance** means you can accept a more general (super)type in a place where a more specific (sub)type is expected. - Applies to: Some OOP function arguments (but not Java), some generic type parameters. - Intuition: “Going up the type hierarchy (more general) is OK.” Java doesn't allow contravariant parameter types in method overrides (even though it makes sense in principal). There is lots of interesting further details on how covariance/contravariance impact Java generics, but we'll stop here for CS251. # Relationship between OOP and FP To close out our OOP unit, we'll zoom out and consider the difference again between OOP and FP, now that we understand the OOP mechanisms. # Mechanism: closed vs open functions/methods One core difference between the *mechanisms* of OOP and FP is how we can "shadow" function/method implementations. ### Shadowing for closures In OCaml and most functional languages: closures are, well, **closed** (at creation, they bundle up their environments into a closed context). For example: consider these mutually-recursively-defined functions in OCaml: ```ocaml (* Note, this is intentionally a slow, silly implementation! *) let rec even x = if x=0 then true else odd (x-1) and odd x = if x=0 then false else even (x-1) ``` Client code that uses these may shadow `even`, with a new definition, but calls to `odd` are **unaffected** (they use the "original" version): ```ocaml! (* Shadow "even" with a more efficient implementation: *) let even x = (x mod 2) = 0 (* This does not change odd (too bad, would be faster!) *) (* odd's particular implementation of even was CLOSED *) ``` ### Shadowing for "open" overriding In most OOP languages, subclasses can change the behavior of superclass methods, even ones they do not directly override! Here is the same example as before, now in Java: ```java class A { boolean even(int x) { if (x == 0) return true; else return odd(x-1); } // odd's particular implementation of even is OPEN boolean odd(int x) { if (x == 0) return false; else return even(x-1); } } class B extends A { // Improves odd in B objects, even though B does not // change/override odd directly! boolean even(int x) { return x % 2 == 0; } } ``` The new definition of `even` defined by `B` overrides `even` in **all** context for objects of dynamic type `B`, **including** in the `odd` implementation that is still only defined in `A`! Critically, `A`'s `odd` implementation was *not closed* when `A` was defined: the author of `A` **cannot know** what specific version of `even` might be called (unless `even` is modified with the `final` keyword). ## Closures vs. Objects In this way, we can consider the following differences between closures and objects: Closures: – Capture code of function, at function definition. – Capture all bindings the code may use, by lexical scope of definition. Objects: – Capture code for all methods that could be called on it, by class hierarchy. – Capture bindings that may be used by that code, by instance variables declared in class hierarchy. ## FP vs OOP decomposition/extensibility ### FP vs. OOP decomposition FP and OOP represent two distinct world views! In FP, functions perform some operation, which may use pattern matching to specialize on variant types or other patterns. In OOP: classes/prototypes give behavior to some kind of data. Subclassing allows specialization for subtypes. Which of these approaches is better depends on the domain, how the software is expected to change over time, and (perhaps most importantly) engineer preference and social factors. ### Table view of FP vs OOP decomposition Consider implementing functionality on our `Expr` running example, where `Expr` expressions can be either a `Var`, `Add`, or `Num`. The functionality/behavior we want to define is `interp` and `toString`, which should be specialized for each subtype. | | `interp` | `toString` | | -------- | -------- | -------- | | `Var` | | | | `Add` | | | | `Num` | | | :::success Consider the following: In FP, do we group code by rows of this table, or columns? Likewise, in OOP, do we group code by rows or columns? ::: :::spoiler Think, then click! FP groups code by columns (by function). OOP groups code by rows (by type of data). More on this below! ::: #### FP view of the table In OCaml, we would create the `Expr` subtypes with a variant type. Each function would then be a definition **over all variants**, with specialization via pattern matching. We can think of this as a vertical grouping of columns in our table! Said another way, in FP, function implementations, with subtype behavior spread across function implementations. | | :large_blue_square: `interp` | :large_red_square: `toString` | | -------- | -------- | -------- | | `Var` | :large_blue_square: | :large_red_square: | | `Add` | :large_blue_square: | :large_red_square: | | `Num` | :large_blue_square: | :large_red_square: | #### OOP view of the table In Java, we would create the `Expr` as a superclass, then define `Var`, `Add`, and `Num` as subclasses that each extend `Expr`. Each subclass would then define *its own* specialization of `interp` and `toString` via method overriding. Said another way, in OOP, we group code based on subtype, with function implementations spread across subtypes. Each subclass would then be a definition **over all overridden methods**, with specialization via inheritance. We can think of this as a horizontal grouping of rows in our table! | | `interp` | `toString` | | -------- | -------- | -------- | | :large_blue_square: `Var` | :large_blue_square: | :large_blue_square: | | :large_red_square: `BinOp` | :large_red_square: | :large_red_square: | | :large_yellow_square: `Num` | :large_yellow_square: | :large_yellow_square: | ### Implications on extensibility Choose which view is better partially depends on how we expect the code to change over time. When code can be easily adapted to future changes, we call it **extensible**. Extensibility is an important goal in real-world software engineering, since most code is not "one and done": it continuously evolves over time! Which of FP vs OOP is better for extensibility depends on how we expect the code to change in this way: do we think it's more likely/frequent that we will change the set of supported **functions**, or **subtypes**? #### FP: Good for new functions Since FP implementations are grouped by *function*, adding a new function is easy: we just define the new function without changing existing functions! For example, if we want to add a `typeCheck` function for our `Expr` language, we need only add a `fun type_check` definition. This is part of why programming language people love FP: in the long run, programming language ASTs are fairly stable, but the functions/analyses will define over them change more frequently! #### OOP: Good for new subtypes Since OOP implementations are grouped by *subtype*, adding a new subtype is easy: we just `extend` the existing superclass, without changing existing classes! For example, if we want to add a `Neg` variant `Expr` language, we need only add a `class Neg extends Expr` class definition. This is part of why OOP is the paradigm of choice for most Graphical User Interface (GUI) frameworks: for GUIs, adding new components (e.g., a new button type) is the more common case! ### Concluding musings for OOP As we've discussed, FP and OOP are different worldviews, but both can be incredibly useful in different cases. You now are more prepared to choose between the two, including in emulating good designs from one in a language that primarily uses the other! --- For our last topic of the semester, we'll focus on an important idea across programming languages: memory safety and memory management. # Memory safety ## Press Release: Future Software Should Be Memory Safe In 2024, the Biden White House released the following press release: [Press Release: Future Software Should Be Memory Safe](https://bidenwhitehouse.archives.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/). There has been a massive push in recent years for software systems to use **memory safe programming languages**. As we wrap up this first semester of study of programming languages, I want us to all understand what this means! ## Definition of memory safety :::info **Memory safety** is the property of a program that says that references to data must always be to the correct size and type of data, and in valid memory that the program is allowed to read/write. ::: Said another way, a **memory safe** program does not contain memory errors: accessing memory that it should not, or attempting to access memory of the wrong type. One simple memory error would be declaring an `a` array of size 3, then trying to access `a[4]`. While a *safe* language might not rule this out via type checking, it might fail with a specific dynamic error, such as an array out of bounds exception. In an *unsafe* language, like C, accessing `a[4]` might simply return some adjacent value in memory. :::info A **programming language** is memory safe if it rules out memory unsafe programs (either by static analysis with the type system, or dynamic checks). ::: Most language you have used are memory safe: Racket, OCaml, Python, and Java are all memory safe. The two primary languages that are **not** memory safe are C and C++: the list of unsafe languages is much shorter, but these languages are still in widespread use! ## Security implications of manual memory management In computer security, there is a concept called ["CVEs", or Common Vulnerabilities and Exposures](https://www.cve.org/About/Overview). CVEs are essentially a public, tracked list of known security vulnerabilities (bugs that could be exploited). :::success What percentage of CVEs listed by Microsoft do you think are from memory safety issues? ::: :::spoiler (Think, then click!) The answer is [around 70%](https://msrc.microsoft.com/blog/2019/07/a-proactive-approach-to-more-secure-code/)! That's a huge number of bugs that could have been avoided by using a memory safe language! ::: Next class, we'll learn more about how programming languages can help prevent these treacherous memory safety bugs!