Lab IV: Reference Counting

**[« Back to LFS website](https://csci1710.github.io)** # Lab IV: Reference Counting :::warning **Lab Dates:** Wednesday, February 26th - Thursday, February 27th ::: ## Overview This lab will build on your understanding of relational Forge! If you're not yet comfortable with transitive closure, check out [this](https://csci1710.github.io/book/chapters/relations/reachability.html?highlight=transitive%20closure#relations-and-reachability) documentation. When you write a program, you create potentially large data structures which need to be stored in memory while the program is running. When you create the structure, you allocate a region of memory for it to be stored. This presents a problem: if we just keep allocating more and more regions of memory, we’ll eventually run out of space. To prevent this from happening, we need to at some point deallocate the memory we no longer need. In some languages, like C, it is up to the programmer to manage when memory is deallocated. While this may be preferable for some applications, most modern languages use something called a garbage collector, which automatically detects when allocated memory is no longer needed, freeing the programmer from having to manage it themself. If you’ve never heard of this concept, it probably means you’ve been using garbage collected languages without even knowing it! In this lab, we’ll explore how we can model memory and automatic memory management in Forge. :::warning **Note:** There’s no stencil for this lab :) You’ll be writing the model yourself. Be sure to add anything in a code block to your file! ::: ## Setup Start by opening a new file. At the very top of the file, make sure to include the following line: ``` #lang forge ``` ::: info **Note:** We’re using the full Forge language for this lab, and not Froglet! You’ll have access to relations and operators you couldn’t use in the previous lab. Using relations and relational operators is the main learning focus of this lab! Thus, we’d like you to avoid using the built-in reachable predicate, and instead get practice using the [transitive-closure operator](https://csci1710.github.io/forge-documentation/building-models/constraints/expressions/relational-expressions/relational-expressions.html#-transitive-closure) (^) directly to describe reachability. We don’t expect you to be familiar with all relational operators, just please use transitive closure. ::: ## Memory Cells In this lab, we'll use Forge to model computer memory as a collection of ``Memory`` cells: ``` abstract sig Memory {} ``` There are a couple different regions of memory which programs use. First, there’s the stack, which is generally where each function’s parameters and local variables are stored. Since parameters and local variables only need to last as long as the function call does, deallocation is easy, so the garbage collector doesn’t come into play. For simplicity, we’ll model this as a single, unique ``Stack``. ``` one sig Stack extends Memory {} ``` For data structures that are used between multiple functions, the heap is used for storage. For data stored on the heap, it is not always easy to tell when it’s no longer needed—this is the job of the garbage collector. We’ll model this memory as a collection of ``HeapCells``. ``` sig HeapCell extends Memory {} ``` :::success **Task:** Add these declarations to your Forge file! ::: ## References Data in memory might contain references to data in other places in memory. For instance, an object might have as a field another object which is stored elsewhere. For some applications, it might be suitable to simply add a ``references`` field to ``Memory``: ``` sig Memory { references: set Memory } ``` Note that having a *relation* to represent the references between cells is natural: the references form a directed graph between cells. :::success **Task:** 1. Try visualizing this model now by adding ``run {} for 2 Memory`` to your model. 2. Now change `references: set Memory` to `references: set HeapCell` and rerun. ::: :::info **Stop and Think:** What changed? We will keep the perspective that references only point to `HeapCell`s throughout the lab, since `HeapCell`s are the type of memory we’ll be garbage collecting. ::: ## State As a program is running, the data being used changes. This is reflected both by additional memory being allocated and by the references between parts of memory changing. To reflect this, we need to move `references` out of `Memory` and into a new signature representing the references *at a particular time*, as well as which `HeapCell`s are currently allocated. ``` abstract sig State { allocated : ..., references : ... } ``` Note that because `references` is no longer a field of memory cells, it may have a different type than it had before. For this problem, we will only consider three states: `A`: an initial state, `B`: one where references have changed during program execution, and `C`: one after garbage collection has taken place. ``` one sig StateA extends State {} one sig StateB extends State {} one sig StateC extends State {} ``` In a real program, this process would repeat over and over: the garbage collector would run, references would change, and the garbage collector would run again—effectively, each `StateC` would become the new `StateA`. Since Forge will check every possible StateA, we don’t need to worry about this fact in our model. :::success **Task:** Delete the references field from the `Memory` signature. Add the `State`, `StateA`, `StateB` and `StateC` signature to your model. Replace the ellipses in the definition of `State` with the appropriate types. ::: ### Visualizer Tip: Projecting over State Try running again. At first, it may appear that this change has mostly had the effect of making the visualization of the model far more tangled. We would like to easily see what has changed between states. To do this, we can project over the `State` signature using the **Projection** menu on the left side of the toolbar of the visualizer. Project over the `State` signature. Use the arrows or the dropdown to switch the projection between `StateA` and `StateB`. If the visualization graph does not contain any edges, click **Next** until you are viewing an instance that does have references. ## Soundness and Completeness There are at least two properties we would really like a memory-management system to observe. First, it shouldn't deallocate memory that is still being used (*soundness*). If a `HeapCell` is not allocated, that means the program is free to use it whenever it needs to store additional data, so if memory that is still being used is deallocated, it might be overwritten by other data. Second, the memory-management system should eventually deallocate memory which is no longer being used (*completeness*). If it keeps memory allocated which we no longer need, the program might run out of memory. Since the `Stack` contains each function’s local variables, it is the starting point from which a program may traverse references to reach other parts of memory. So, if memory is still reachable from the program `Stack`, we're still using it. We hope that our memory management system keeps everything we are using allocated (*soundness*), and tosses everything we are no longer using (*completeness*). To check this, we'll define a helper predicate: ``` pred reachFromStack[m : HeapCell, s : State] { ... } ``` :::success **Task:** Complete the implementation of `reachFromStack`, which should be true only when `m` (a `HeapCell`) can be reached starting at the `Stack` memory in the given `State`. <details> <summary>Hint</summary> Start by getting the `references` for the current state `s`, which should be a binary relation. Then, use a relational operator we introduced in class. </details> ::: ### Soundness A memory state is safe if all memory reachable from the Stack is allocated: ``` pred safe[s : State] { ... } ``` A memory management system is *sound* if acting on an initially safe memory state implies that the following state will also be safe. That is, as long as all memory we’re using is allocated to begin with, after garbage collection occurs, the memory we’re using will still be allocated (only unused memory will be deallocated). We’ll check this with the following test.  ``` pred validGC { safe[StateA] } pred sound { safe[StateC] } assert sound is necessary for validGC ``` :::success **Task:** Add this predicate and test to your model and complete the implementation of `safe`. Since you have not described exactly how the garbage collector will transition between `StateA` and `StateC`, the test **should fail**. ::: ### Completeness A memory state is *clean* if all allocated memory is reachable from the `Stack`: ``` pred clean[s : State] { ... } ``` A memory management system is *complete* if acting on an initial *clean* memory state implies that the following state will also be clean. That is, as long as we are using all allocated memory in the beginning, after garbage collection occurs, there won’t be any allocated memory which we aren’t using (so the garbage collector deallocated all memory we weren’t using). We’ll check this with the following test. ``` pred cleanGC { clean[StateA] } pred complete { clean[StateC] } assert complete is necessary for cleanGC ``` :::success **Task:** Add this predicate and complete the implementation of `clean`. Since you have not described exactly how the garbage collector will transition between `StateA` and `StateC`, the test **should fail**. ::: ## Reference Counting We’ll consider a garbage collector which uses *reference counting* to decide what memory it should deallocate. For each cell of heap memory, a reference counting collector stores the number of *incoming* references to it. When a memory cell's reference count drops to zero, the cell is deallocated. To represent this, write a function `ref_count` that returns the number of incoming references to a `HeapCell` in a certain `State`: ``` fun ref_count[m: HeapCell, s: State]: Int { ... } ``` :::success **Task:** Complete the implementation of `ref_count` <details> <summary>Hint</summary> You’ll probably find the `#` operator, which counts the number of entries in a given relation, useful in implementing `ref_count`. </details> ::: ## State Transitions When we introduced them, we said our three states would represent, in chronological order: - `StateA`: an initial state (perhaps not the true starting state of the system, but one where the garbage-collector has just completed its previous run); - `StateB`: a state where the programmer has changed memory references; and - `StateC`: a state after garbage collection has taken place again. Now, we’ll add constraints to enforce this behavior. Between `StateA` and `StateB`, the programmer may create or destroy references, but the memory manager does nothing (i.e. there may be *new* allocated cells due to new references, but all the cells allocated in `StateA` remain allocated in `StateB`). Since references may change, reachability from the stack may change. Keep in mind that the programmer cannot create references to unallocated memory, so it should be the case that any memory reachable from the `Stack` in `StateB` is allocated (there can be allocated memory which is not reachable from the `Stack` in `StateB` – perhaps the programmer destroyed a reference to it between `StateA` and `StateB` – and it will be the garbage collector’s job to deallocate it). ``` pred A_to_B_Changed { ... } ``` Between `StateB` and `StateC`, garbage collection will occur, so the references in memory should not change, but which cells are allocated may. A reference counting collector will enforce that for all memory cells, a cell will be unallocated in `StateC` if and only if it has a reference-count of 0 in `StateB`. ``` pred B_to_C_GarbageCollected { ... } ``` Even though we want these predicates to always hold, they won’t be enforced unless they’re referenced by whatever we’re actually running. To make sure that happens, first add the following predicate: ``` pred GarbageCollector { A_to_B_Changed B_to_C_GarbageCollected } ``` :::success **Task:** Add and complete all the above predicates. Then, change the soundness and completeness tests so they require `GarbageCollector` in addition to the first state being safe/clean for the assertion about the last state to hold. ::: ## Verifying Correctness Check both `soundness` and `completeness`. Is reference-counting sound? Is it complete? How do the consequences of a garbage collector violating `soundness` compare to the consequences of violating `completeness`? ### Soundness If you implemented reference counting correctly, soundness **should not be violated**. If you are getting counterexamples, try to see how you might fix the problem or ask a TA for help. ### Completeness Unless you wrote some extra constraints, completeness **should be violated**. If you are not getting counterexamples, try to see how you might fix the problem or ask a TA for help. :::info **Stop and Think:** What circumstances might cause completeness to fail? You should be able to guess just by looking at the counterexamples Forge finds for you. It will help to write a run statement that finds counterexamples. ::: <details> <summary>[Optional] Other ways of testing</summary> You could test for completeness using this as well: ```! soundnessTHM: assert { (GarbageCollector and safe[StateA]) => safe[StateC] } for 6 Memory is checked completenessTHM: assert { (GarbageCollector and clean[StateA]) => clean[StateC] } for 6 Memory is checked -- fail -- This is basically what checked translates to completenessUNSAT: assert { not ((GarbageCollector and clean[StateA]) => clean[StateC]) } for 6 Memory is unsat -- Checked theorems are also basically assertions -- For example, {assert A is sufficient for B} is equivalent to {A implies B} is checked ``` </details> ## Checkoff Call your TA and show your findings! ### Looking Ahead Memory management is a fascinating and complex topic that could take up an entire course on its own; many modern languages that have automated memory management use some hybrid system of cheap reference counting and other, more expensive approaches. In your next homework, we’ll build on this memory-management model to investigate approaches that do guarantee completeness (but have a higher impact on system performance). ---