Lab 10: Understanding and Testing Mutation

In this lab, we will continue with two ideas that we introduced in lecture: understanding how data mutation works and testing functions that mutate data.

When a program is maintaining data over time, sometimes we want that data to be shared across different parts of the program and sometimes we do not. Decisions about what should or should not be shared affect how we set up our data and write programs.

Dataclasses, Examples, and Memory Diagrams

Consider a tool (like Google Docs) that allows people to create documents and share them with other people. Here's a basic dataclass for Documents.

from datetime import datetime
from dataclasses import dataclass

@dataclass
class Document:
    title: str
    last_edited: datetime
    contents: str

hwk1 = Document("Homework 1",   
                 datetime(2022,11,15,13,24,36,10), 
                 "Write a program")
# the parts of datetime are year, month, date, hours, mins, ...

Another basic dataclass captures people who are allowed to edit documents:

@dataclass
class Editor:
    name: str
    docs: list
    
ashley = Editor("ashley", [hwk1])

Imagine that the CS111 homeworks team is preparing assignments that are in different stages of readiness. Some documents are ready for sharing with others, but some stay private until the creators are ready to share them.

Task: Copy the above code fragments to a Python file. Add variables and definitions to set up the following set of editors and documents with their corresponding sharing settings:

Katie as an editor with no documents
James as an editor who is also working on homework 1
Toshi as an editor who is working on homework 2 (which is not yet shared with anyone)

Task: Draw the memory-layout diagram for the current collection of Editors and Documents.

What's a memory-layout diagram???

Consider the following program:

@dataclass
class Environment:
    name: str
    size: num


@dataclass
class Webkinz:
    name: str
    species: str
    home: Environment
    

prairie = Environment("Prairie", 8)
forest = Environment("Forest", 10)

sparkle = Webkinz("Sparkle", "Unicorn", forest)
brownie = Webkinz("Brownie", "Horse", prairie)

We could draw our memory-layout diagram like this, where each box represents our data in memory. Each box resides in some location in memory (written as locXXXX). When a variable or field references another piece of data in memory, we use the location to identify the piece of data.

If we want, we can highlight the references between variables/fields and data by also drawing arrows that connect a reference to a location with the actual piece of data. The arrows, however, mask a key piece of what is happening: that data live in spots in memory with unique numeric identifiers.

It is critically important that names from the directory do not appear in the heap area/inside the data. Using the names suggests that later updates to the variable values (like prairie = ...) would affect the data in the heap, but this is NOT how languages work.

CHECKPOINT: Call over a TA once you reach this point. Note that the TA can ask either partner to answer, so please make sure both partners understand the diagrams.

Ashley has an inspiration for new ideas for homework 2. She doesn't want to disrupt Toshi's progress, but she does want to try re-working content to see if her ideas make sense.

Task: Extend your memory-layout diagram to show Ashley being the sole editor of a copy of Toshi's current homework 2 document. Upload a picture of your memory-layout diagram from lab so far to this Google Form.

Task: Add some code to your file that would create the same memory contents as what you just drew to give Ashley a copy of homework 2.

Ashley and James have been working on homework 1, but their idea isn't working out. It's such a mess that they have decided to just throw away their old attempt and start over.

Task: Here are two ways to "start over" on homework 1. What effect will each have on the memory diagram? After taking each approach (separately), what contents will Python produce if we ask to see Ashley's version of hwk1?

# Approach #1
hwk1 = Document("Homework 1", datetime.now(), "This is much better")

#Approach #2
hwk1.last_edited = datetime.now()
hwk1.contents = ""

One of the most subtle concepts to internalize when starting to work with updating data is the difference between updating variables and updating contents of data structures. The previous task highlights the differences.

Task: To help you internalize these details, please talk through the following questions with your partner.

If we update what hwk1 refers to using approach 1, which editors will see the changes?
- Anyone that already had access to the old hwk1
- Anyone that gets access to hwk1 in the future
- Both past and future editors of hwk1
If we update what hwk1 refers to using approach 1, will there be any way to access the original hwk1 contents?
If we update the contents of hwk1 using approach 2, which editors will see the changes?
- Anyone that already had access to the old hwk1
- Anyone that gets access to hwk1 in the future
- Both past and future editors of hwk1
If we update what hwk1 refers to using approach 2, will there be any way to access the original hwk1 contents?

Functions for Key Document Tasks

Creating, sharing, editing, and copying documents happen so often that it would make sense to have functions corresponding to those tasks.

Task: Write a function create that takes an Editor and a document title. The function creates a new document, stores it in the editor's list of docs, then returns the new document.

Task: Write a function edit that takes a Document and a string (for the new content) and replaces the document contents with the new text.

Task: Write a function share that takes a Document and an Editor and adds the document to the editor's list of docs.

Task: Write a function copy that takes a document and returns a copy of the document. The copy has the same title and contents as the original, but the last_edited component is the current date and time (use datetime.now() to get this).

Task: Look back at your four functions: which ones return something and which ones don't? Focus on the ones that don't return anything. Do they "accomplish" anything? Can you articulate what a function that doesn't return anything has to "do" in order to be of use? Write down your answer. (hint: think about memory diagrams and changes to them)

CHECKPOINT: Call over a TA once you reach this point.

Testing

Now, we have to figure out whether the functions you've written provide the sharing conditions that we had in mind. In particular, think about the edit, share, and copy functions.

Your goal is to fill in this template of a testing function:

def test_docs():
    "SETUP"
    # your scenario setup goes here

    "PERFORM MODIFICATIONS"
    # calls to your functions go here

    "CHECK EFFECTS"
    # assertions go here

Task: Identify an initial scenario of editors and documents against which you will run your tests. Create the corresponding data in the "SETUP" area of test_docs.

Task: Write down (in prose) a list of effects that these operations should have. Put these in a multi-line string comment above your test_docs function in your file:

"""
Expected effects:
- If we create a doc titled t for an editor, the editor will have a document 
  with title t in their docs list
- YOU ADD MORE ITEMS HERE
"""

Task: Write down a list of effects that these operations should not have. For example, "if one document is edited, then the contents of another document should not change". Add these in a separate comment like the one above in your file.

Task: Make a collection of calls to the edit, share, and copy functions in the "PERFORM MODIFICATIONS" section. These calls should be sufficient to perform the checks that you outlined in your lists of effects.

Task: Turn your expected and unexpected effects into pytest assertions, putting those in the "CHECK EFFECTS" section of the testing function.

Task: We are curious to see which effects groups are and aren't identifying. Please visit this Google Form, and copy/paste your prose lists of expected effects.

CHECKPOINT: Call over a TA once you reach this point.

If you are finished with time to spare, feel free to take time to get ahead on the homework.

Lab 10: Understanding and Testing Mutation

An Example of Sharing Data: Collaborative Editing

Dataclasses, Examples, and Memory Diagrams

CHECKPOINT: Call over a TA once you reach this point. Note that the TA can ask either partner to answer, so please make sure both partners understand the diagrams.

Functions for Key Document Tasks

CHECKPOINT: Call over a TA once you reach this point.

Testing

CHECKPOINT: Call over a TA once you reach this point.

Read more

CS200 Bridge 5: Running Times

CS200 Bridge 4: Structured data for formatting

CS200 Bridge 3: Files and Directories

CS200 Bridge 2: Binary Search Trees