Lecture 18: Objects and Dynami

Lecture 18: Objects and Dynamic Dispatch --- :::warning Announcements - Quiz 2 graded (Gradescope) - Assignment 8 now due next Tuesday - Consider taking CS343: Distributed Systems in the spring! - CS Club Coding Workshop (with me) this Thursday, 5:30-6:30 in L039 - Next help hours: - Mine 6:30-8:30pm, L037 - Grace's 1-2pm Weds., Zoom ::: :::success Agenda for today - Wrap up Concurrency topic - More on `mutex` abstraction - Object-oriented programming - What is an object? - How are names resolved? - Group exercise: predicting Java output - Overriding and dynamic dispatch - (Next time) Relation to functional programming ::: # Concurrency: more on `mutex` Recall our example from last time. The premise is that we have two different long-running chunks of work that need to happen, each of which produces a list of strings. Our goal is for the long-running work to happen in parallel, but for each list of strings to be printed as one continuous list (without interleaving). To do this, we'll need to use concurrency primitives to coordinate shared access to `stdout`. We'll use `Unix.sleepf` with a delay of `2` to simulate a long-running amount of work. ```ocaml let sleep_then_print delay lines mutex = Unix.sleepf delay; Mutex.lock mutex; List.iter (fun line -> print_endline line ) lines; Mutex.unlock mutex (* lists not shown *) let main () = (* Create a mutex to manage stdout *) let mutex = Mutex.create () in (* Pass it to each sleep_then_print call*) let t1 = Domain.spawn (fun ()-> sleep_then_print 2. list1 mutex) in let t2 = Domain.spawn (fun ()-> sleep_then_print 2. list2 mutex) in Domain.join t1 ; Domain.join t2 let _ = main () ``` The OCaml library will ensure that at runtime, whichever domain reaches the `Mutex.lock mutex` line first gets to "have" the mutex first. This allows that domain to complete its printing with exclusive access to `stdout`. The other the is forced to wait until the mutex is unlocked before allowing it to enter that section (the "critical section") of the code. Notably, both domains can still sleep at the same time, allowing true parallelism for the independent/non-shared parts of their work. The `mutex` forces the code between the `lock` and `unlock` to happen sequentially, successfully coordinating on just the necessary piece of functionality (here, sharing one `stdout`). ### Deadlock If we made a mistake and forgot to call `unlock`, we could run into an issue where the second domain to get to the `Mutex.lock mutex` lock line can never actually make progress - it becomes stuck waiting for a `mutex`/lock that never becomes available. This is known as **deadlock**. Deadlock is again a shared concept across different programming languages. :::info **Deadlock** is when one or more threads (or other concurrent actors) are waiting for a resource held by other threads in the group, in a way that prevent progress. Avoiding deadlock is a substantial issue in distributed systems (see CS343!). ::: Some languages provide higher level "synchronization" constructs that avoid the problem of remembering to unlock in the right place: more on this in `Homework 8: Parallelism & Concurrency`! --- Our next topic is to focus on a programming paradigm you have all seen in other classes: object-oriented programming. :::info This will be the subject of the final programming project in this class: `A9: MiniJava`! ::: # Object-oriented programming ## Why study object-oriented programming :::success Why study OOP in a programming languages class, especially when we mostly focused on functional programming? ::: :::spoiler Think, then click! - OOP is frequently brought up as a core paradigm, along with imperative/functional programming. I don't find thinking of these as firm delineations to be very useful, but understanding some key features is important. - Many languages support some forms of objects, even if they are not primarily OOP. - Understanding the *implementation* of objects helps with debugging and improving the performance of real-world code. ::: ## What is an object? Everyone in this class has taken CS230 (or the equivalent), so you all have seen some Java. :::success What is an object? ::: :::spoiler Think, then click! Definitions vary! We can think of an object as: - a *value* (cannot be evaluated further) - that maps *names* to other things: either: - other values (*fields*), and - specific functions (*methods*) ::: # How are *names* resolved? Names are key piece of semantics in any language. ## In Racket, OCaml In the Racket and OCaml languages that we've studied, variables are the result of bindings. Closures (anonymous functions) follow lexical scope, which provides unambiguous binding semantics. We saw that in OCaml, records have a concept of *field names*, but: 1. Field names are not truly variables, and 2. We do not have to account for any difficult "lookup": the name always refers to a piece of data within the overall value. ## In Java (and other OOP languages) In Java, *local* variables follow lexical scope. Closures (anonymous functions) are less a key part of the language. We do have a new concern, though: how to handle name lookup for instance variables (fields) and methods. :::danger Because of *inheritance*, name lookup in object-oriented programming is more complicated! ::: ## Name lookup in OOP languages ## Design space Language designers have at least two core questions to consider when designing field/method name semantics (from the [PLAI textbook](https://www.plai.org)): 1. Is the set of available method/field names statically fixed? Can programmers change it on-the-fly, dynamically? 2. Is the method/field being accessed at a program point statically fixed, or can it be computed dynamically? By this, we mean: can you construct a name at runtime by, for example, string concatenation to say a field is `"a" + "b"`? This gives us a 2x2 design space where languages can sit: | | **Name set is static** | **Name can be computed** | |------------------------------|---------------------------------------------|-----------------------------------------------| | **Fixed set of field/methods** | | | | **Dynamic set of fields/methods** || | :::success Consider: Where do we think Java goes in this diagram? What about JavaScript? OCaml has some OOP features, where do you think they go? ::: :::spoiler Think, then click! | | **Name set is static** | **Name can be computed** | |------------------------------|---------------------------------------------|-----------------------------------------------| | **Fixed set of field/methods** | Basic Java, OCaml OOP | Java with reflection. | | **Dynamic set of fields/methods** | Not useful | Most “scripting” languages, JavaScript, Python | ::: The bottom left quadrant does not really make sense for any language, for the following reason: if the *set* of names must be statically set before runtime, it wouldn't make sense to be able dynamically construct a name to *use* during runtime. All other quadrants do have representation within real languages! ### Name lookup in scripting OO languages The bottom right quadrant is the most permissive/least strict, and is where most scripting languages sit. It's also where language with so called "duck typing" sit. :::info :duck: The name "duck typing" comes from the saying: > “If it walks like a duck and it quacks like a duck, it must be a duck.” In languages with "duck typing", instead of checking an object's type, the language just dynamically checks if the object has the method or field in question. If it does, the language allows the access/call to happen. ::: This quadrant includes both Python and Javascript. :::success In terms of implementation, how do you think these languages work? That is, how might their interpreters handle objects? ::: :::spoiler Think, then click! Scripting languages interpreters typically use a dictionary, or hash-table, to represent an object instance. The name of a field/method is just a string (or symbol) key into the dictionary. Name lookup then becomes accessing a value for a given dictionary key. We can dynamically construct names, because we can dynamically construct strings. ::: :::info Interesting side note: as noted in older versions of the PLAI textbook, this "objects as dictionary" mentality is actually how the JSON data format was invented! JSON stands for "JavaScript Object Notation", and it is a subset of the textual representation of JavaScript objects. ::: This underlying dictionary strategy makes this quadrant less semantically interesting, so we won't focus on it further at this point. ### Name lookup in more statically-typed OO languages, like Java The top left quadrant is the most interesting in terms of (1) corresponding to traditional object-oriented programming, and (2) having interesting semantics for inheritance. We'll thus focus in on the flavor of objects. In this quadrant, objects have a statically fixed set of names and static name references. Within this segment of the design space, we can consider how names are resolved with Java inheritance as a concrete example. Let's first do an exercise. ## Group exercise :::success What is printed by the following Java program (4 print statements in total)? Draw pictures of the objects to help you understand what's happening. :star: note, this uses *variable overriding*, a feature of Java that classes like CS230 often encourage students to avoid. So do not worry if your group is unsure of the answer! Feel free to give two answers: "if Java does X, the answer is ...". "if Java does Y, the answer is...". ::: (Note: program adapted from a quiz originally by [Justin Pombrio](https://justinpombrio.net)). ```java class A { public int x = 1; public int f() { return x; } public int g() { return x + f(); } } class B extends A { public int x = 10; public int f() { return x; } } class C extends B { public int x = 100; public int g() { return x + f(); } } public class Tricky { public static void main(String[] args) { A declAC = new C(); System.out.println(declAC.x + declAC.f()); System.out.println(declAC.g()); C declCC = new C(); System.out.println(declCC.x + declCC.f()); System.out.println(declCC.g()); } } ``` :::spoiler Think, then click! ``` 11 110 110 110 ``` Explanation: We have 3 classes: `A`, `B`, and `C`, where: - `B extends A` (inherits from `A`) - `C extends B` (inherits from `B`) Each class has its own `x` field, and `B` and `C` **override** some methods: - `B` overrides `f()` - `C` overrides `g()` ::: ## Understanding the semantics of inheritance To figure out which values get used, we need two core rules from Java inheritance. First, we have a distinction between the *declared* (or static) type and the *actual* (or dynamic) type. In the snippet: ```java A declAC = new C(); ``` `C` is the actual/dynamic type, because its constructor is used in the `new` expression. `A` is the declared/static type, because the local variable is declared as having type `A`. :::info The **actual**/**dynamic** type of a Java object is the one used in the new expression: for example, `new A()` has actual type A. ::: :::info The **declared/static** type of an object is based on how the variable is syntactically declared. This can be from: - A local variable declaration, e.g., in `A a = ...`. - A function argument, e.g., in `void foo(A a) {...}`. - A cast, e.g, `((A)b)` has declared type `A`, even if `b` was declared as a `B`. - An implicit `this.`. For example, if a method inside class `B` leaves off the class from an instance variable `foo`, there is an implicit `this.foo` that means the declared type of `this` is `B`. ::: Now, to the two core rules. The first, you should be more familiar with from CS230: :::success Do **method calls** use the declared or actual type? ::: :::spoiler Think, then click! Method calls use the actual type. This is often seen as a selling point of polymorphism! ::: :::info **Rule 1:** method name lookup is based on the actual/dynamic object type at runtime. Methods then follow inheritance rules: the most-specific (lowest in the inheritance diagram) implementation is used. This is known as **method overriding**. ::: The second, you have less practice with from CS230: :::success Do **field name lookups** (instance variable lookups) use the declared or actual type? ::: :::spoiler Think, then click! Field name lookups are based on the *declared* type. This does not come up as much in CS230, since often, OOP design encourages *private* fields that objects could not even try to override. *In short: fields are not overridden in the same way as methods are*. ::: :::info **Rule 2:** field name lookup is based on the declared/static object type. Within a method body, an unqualified field name, like `x`, implicitly refers to a `this.x`. That is, the declared/static type of a use of a field within the method is the static class that method definition lives inside of. ::: With these rules, we can examine each print in more detail: ## First declaration ```java A declAC = new C(); System.out.println(declAC.x + declAC.f()); System.out.println(declAC.g()); ``` ### `x` and `f()` - `declAC.x` -> Declared type is `A`, so this accesses `A`'s field: `x = 1` - `declAC.f()` -> This is a method call, so we look to the actual object type of `C`. `C` has no overridden definition of `f`, so we look up in the inheritance diagram. `B` does override `f`. So `B.f()` is used. - Inside `B`'s definition for `f`, the `x` that is referenced is referenced with static type `B`. So this returns `B`'s value for `x`, which is `10`. - So: `declAC.x + declAC.f() = 1 + 10 = 11` ### `g` - `declAC.g()` -> This is a method call, which we base on the actual type of `C`. `C` does override `g`, so we look there. - Inside `C`'s `g` definition: - `x` is `C`'s `x = 100` field. - `f` again results in looking up `B`'s `f` body. This returns `B`'s `x = 10` - So: `declAC.g() = 100 + 10 = 110` ## Second declaration ```java C declCC = new C(); System.out.println(declCC.x + declCC.f()); System.out.println(declCC.g()); ``` Here, the declared and actual types are the same in `main` (though function definitions still use the `x` of that object). - `declCC.x` -> again refers to `C`'s field: `x = 100` - `declCC.f()` -> Inherited from `B`, so returns `B`'s `x = 10` - So: `cc.x + cc.f()` -> `100 + 10 = 110` - `declCC.g()` -> Defined in `C` - `x = 100` - `f()` = `B.f()` -> returns `10` - So: `100 + 10 = 110` # Dynamic dispatch The general name for how methods are resolved dynamically is *dynamic dispatch*. This is a core feature of both OOP languages and less object-oriented languages, such as Rust. :::info Interesting side note: Alexa has a published paper on Rust's dynamic dispatch implementation, that arose out of a PhD intern project at AWS! [Verifying Dynamic Trait Objects in Rust](https://ieeexplore.ieee.org/document/9794041). ::: More on objects next time, including the use of the `super` keyword!