Recap: Lists segment

--- tags: Recaps --- # Recap: Lists segment Here's a summary of essential points from our segment on lists: By definition, a *list* is an ordered sequence of items. Meaningful operations on lists are adding elements, getting the element at a specific position, removing elements, and checking whether a specific element is in the list. ## Mutable vs Immutable Lists Assume you have an existing list that you can access through a name (such as `myList`). You use the operation provided by your programming language to add an element to `myList`. If you look at the contents of the `myList` variable after the add operation, is the new element there? - if yes, you are working with a **mutable** list - if no, you are working with an **immutable** list Mutable/immutable is all about whether the new element is accessible through the same location in memory as the original list. As a programmer, practically, this comes down to whether you or the language saves changes to the list. **When do we want a mutable list?** - if two places in your program refer to the same list in memory, and you want the changes through one to be visible through the other, you must use a mutable list (e.g., a shared calendar or shared to-do list). **When do we want an immutable list?** - if you need to make temporary changes to a list, you should use an immutable list. **If a list is mutable but we want to make a temporary change, what do we do?** - Make a copy of the list. Make the temporary change in the copy (and use the copy in your temporary computation). This leaves the original list untouched. ## Implementing Lists: Arrays vs Chains of Nodes With respect to where things live in memory, there are two ways to organize elements: 1. as a collection of nodes, each connected by a reference to the next (and perhaps previous) node in the sequence. These nodes can end up scattered all around memory. This is called a **LinkedList** (both in Java and in general CS terminology). 2. as a sequence of consecutive slots in memory. A sequence of consecutive slots is called an **array*; the use of an array to provide list operations is often called an **ArrayList** (this is the term in Java). The choice of linked nodes versus an array has implications for the running-time of the core list operations: | Operation | Linked List | Array | | --------------- | ----------- | -------- | | add first/last | constant | amortized constant [1] | | contains | linear | linear | | get-by-position | linear | constant | | remove element | linear | linear | | remove node | constant [2] | - | [1] An individual addition could be linear time (if the array is out of space). If we double the capacity of the array each time it runs out of room, the average across $n$ element additions becomes constant with respect to $n$. *Amortized* analysis means that we look at the average cost across multiple operations, not a single operation. [2] Assuming we have both next and previous references in a node. If the only node references are to the next node, then there is a linear cost to find the node prior to the one we want to remove. **What about space usage?** LinkedLists save some space (for the node classes and their fields) for the elements that are in the list, but Arrays set aside space for future elements. Space is not a primary consideration when choosing between these two data structures. ### Choosing between LinkedList and ArrayList As a programmer, you should choose a LinkedList vs an ArrayList based on which operations you expect to do most frequently. If you will be doing a lot of positional lookups, an ArrayList would be better. If you need to do a lot of addition and removal of elements, a LinkedList would be a better choice. ## Study Questions 1. (Unless you came from CS15) You programmed with lists throughout your previous course without us having to discuss mutability/immutability. Consider the following function that produces a list of the short words from a list of words. (This code isn't in any specific language, but the constructs should all be familiar). The code works with two lists: the input list and the produced list. Which of these two lists could be mutable? Which could be immutable? If either is possible, do you have any preferences? What code could you write to check which kind of list each one is? ``` fun keep-short-words(wordList) :: if isEmpty(wordList): return empty; else: first = wordList.get(0); rest = wordList.removeFirst(); if (length(first) < 5): return keep-short-words(rest).addFirst(first); else return keep-short-words(rest) end end end ``` 2. In lecture, we pointed out that when working with immutable lists, it is the programmer's responsibility to update any names in the environment to refer to the new list after adding elements. For example: ``` L = [list: 1, 4, 2] L = L.immutableAddFirst(5) // an addFirst for immutable lists // L is now [list: 5, 1, 4, 2] ``` You never had to use a pattern like this when programming functionally, even though you were using immutable lists. Why not? 3. Do you agree or disagree with the following statements? Justify your answer. - for a list to be mutable, there *must* be an object (like `LinkList`) that refers to the underlying sequence of elements - if there is an object (like `LinkList`) that refers to an underlying sequence of elements, then that object *cannot* implement an immutable list - it is impossible to have an `addLast` method on immutable lists - it is impossible to have a constant-time `addLast` method on immutable lists - having both `previous` and `next` links, as in doubly-linked lists, only makes sense for mutable lists 4. In lecture, we worked with arrays of `int` and `String`. You can have arrays of any type of data, however. Draw the memory diagram for the following code, then answer the following questions: ``` Dillo[] dilloCircus = new Dillo[4]; babyDillo = new Dillo(5, false); adultDillo = new Dillo(25, true); LinkList<String> words = new LinkList<String>(); bluenoDillo = new Dillo(40, false); dilloCircus[1] = adultDillo; dilloCircus[0] = bluenoDillo; dilloCircus[3] = babyDillo; ``` 1. Are all of the `Dillo` objects consecutive in memory? Do they need to be in a valid array? 2. `dilloCircus` has nothing in position 2 at the end of the above code. Does that mean it is no longer an array? 3. Would the `dilloCircus` array be valid as the contents of an `ArrayList`? 4. What precisely has to be consecutive in memory with an array? What about with an array that is used to implement a list?